GitHub user lalit2001 closed a discussion: How to Register a Dataset
i have a usecase where i have multiple data type like ( parquet, orc, hudi,
...) i a want to read and load and register all I want it as a dataframe for
some operation and at the same time I want it to register to use SQL is it
possible to do for hudi I'm using the bellow code
hudi_table = (
HudiTableBuilder
.from_base_uri(path)
.build()
)
records = hudi_table.read_snapshot()
arrow_table = pa.Table.from_batches(batches = records)
table = self.ctx.from_arrow(arrow_table)
for parquet
table = self.ctx.read_parquet(path)
orc:
dataset = ds.dataset(path, format="orc")
table = self.ctx.from_arrow(dataset.to_table())
GitHub link: https://github.com/apache/datafusion/discussions/14318
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]