adamreeve commented on issue #40258: URL: https://github.com/apache/arrow/issues/40258#issuecomment-2010974690
As an alternative to Parquet.Net, you could use ParquetSharp, which wraps the Arrow C++ Parquet library and has built-in support for reading Parquet files as Arrow record batches: https://github.com/G-Research/ParquetSharp/blob/master/docs/Arrow.md Disclaimer: I'm a maintainer of ParquetSharp But otherwise your algorithm seems sensible. I'd only suggest you might want one RecordBatch per row group rather than per file, in case your files contain many row groups and could be large. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
