Re: [I] Implement Utility classes like Arrow-to-Parquet conversion implementation [fluss]

via GitHub Sun, 02 Nov 2025 23:01:53 -0800


luoyuxia commented on issue #437:
URL: https://github.com/apache/fluss/issues/437#issuecomment-3479160605


   > Hi [@luoyuxia](https://github.com/luoyuxia) , to implement the scenario 
you described in issue 
[apache/arrow-java#735](https://github.com/apache/arrow-java/issues/735) 
(reading Arrow RecordBatches continuously from a remote server and writing to 
Parquet) using Arrow Java's `DatasetFileWriter`, we can subclass ArrowReader 
and implement a facade over an iterator. For example, arrow jdbc driver 
demonstrates such approach in its 
[RootArrowReader](https://github.com/apache/arrow-adbc/blob/main/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/RootArrowReader.java).
 Specifically, we need to override `loadNextBatch` in `ArrowReader` to manually 
populate `VectorSchemaRoot`.
   > 
   > Regarding the performance comparison between Arrow Java's Parquet writing 
and the C++ implementation, do we have performance numbers?
   
   Just as discuss offline, that's hacky and will require us to implement a new 
arrow filesystem for the filesystems that not supported by arrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Implement Utility classes like Arrow-to-Parquet conversion implementation [fluss]

Reply via email to