Re: [I] Implement Utility classes like Arrow-to-Parquet conversion implementation [fluss]

via GitHub Sat, 01 Nov 2025 01:48:05 -0700


xx789633 commented on issue #437:
URL: https://github.com/apache/fluss/issues/437#issuecomment-3475983270


   Hi @luoyuxia , to implement the scenario you described in issue 
https://github.com/apache/arrow-java/issues/735 (reading Arrow RecordBatches 
continuously from a remote server and writing to Parquet) using Arrow Java's 
`DatasetFileWriter`, we can subclass ArrowReader and implement a facade over an 
iterator. For example, arrow jdbc driver demonstrates such approach in its 
[RootArrowReader](https://github.com/apache/arrow-adbc/blob/main/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/RootArrowReader.java).
 Specifically, we need to override `the loadNextBatch` in `ArrowReader` to 
manually populate `VectorSchemaRoot`.
   
   Regarding the performance comparison between Arrow Java's Parquet writing 
and the C++ implementation, do we have performance numbers?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Implement Utility classes like Arrow-to-Parquet conversion implementation [fluss]

Reply via email to