GitHub user lalit2001 closed a discussion: How to Register a Dataset

i have a usecase where i have multiple data type like ( parquet, orc, hudi, 
...) i a want to read and load and register all I want it as a dataframe for 
some operation and at the same time I want it to register to use SQL is it 
possible to do for hudi I'm using the bellow code 

        hudi_table = (
            HudiTableBuilder
            .from_base_uri(path)
            .build()
        )
        records = hudi_table.read_snapshot()
        arrow_table = pa.Table.from_batches(batches = records)
        table = self.ctx.from_arrow(arrow_table)
        
 for parquet      
         table = self.ctx.read_parquet(path)
 orc:
 
         dataset = ds.dataset(path, format="orc")
        table = self.ctx.from_arrow(dataset.to_table())

GitHub link: https://github.com/apache/datafusion/discussions/14318

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: 
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to