xx789633 commented on issue #437: URL: https://github.com/apache/fluss/issues/437#issuecomment-3475983270
Hi @luoyuxia , to implement the scenario you described in issue https://github.com/apache/arrow-java/issues/735 (reading Arrow RecordBatches continuously from a remote server and writing to Parquet) using Arrow Java's `DatasetFileWriter`, we can subclass ArrowReader and implement a facade over an iterator. For example, arrow jdbc driver demonstrates such approach in its [RootArrowReader](https://github.com/apache/arrow-adbc/blob/main/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/RootArrowReader.java). Specifically, we need to override `the loadNextBatch` in `ArrowReader` to manually populate `VectorSchemaRoot`. Regarding the performance comparison between Arrow Java's Parquet writing and the C++ implementation, do we have performance numbers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
