[I] Parquet to Arrow conversion [arrow]

via GitHub Mon, 26 Feb 2024 21:02:48 -0800


piyushdubey opened a new issue, #40258:
URL: https://github.com/apache/arrow/issues/40258


   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   Hello,
   
   I am trying to convert a Delta Table to an Arrow Stream.
   
   The table can have any number of parquet files and may or may not be 
partitioned. I am using Parquet.Net to read Parquet Files. 
   
   How should I think about parity between parquet files and RecordBatch. 
Should I create one RecordBatch per parquet file? What should the overall 
parquet to arrow conversion logic look like? Any pointers?
   
   
   Here's a tentative algorithm I have in mind. 
   
   1. Iterate over the list parquet files
   2. Read `ParquetRowGroupReader reader = 
parquetReader.OpenRowGroupReader(rowGroupIndex);`
   3. Extract Columns and Add them to a record batch one by one
   4. Read RecordBatch into ArrowStreamWriter().
   
   Appreciate any help with this.
   
   ### Component(s)
   
   C#


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Parquet to Arrow conversion [arrow]

Reply via email to