luoyuxia commented on issue #437: URL: https://github.com/apache/fluss/issues/437#issuecomment-3479160605
> Hi [@luoyuxia](https://github.com/luoyuxia) , to implement the scenario you described in issue [apache/arrow-java#735](https://github.com/apache/arrow-java/issues/735) (reading Arrow RecordBatches continuously from a remote server and writing to Parquet) using Arrow Java's `DatasetFileWriter`, we can subclass ArrowReader and implement a facade over an iterator. For example, arrow jdbc driver demonstrates such approach in its [RootArrowReader](https://github.com/apache/arrow-adbc/blob/main/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/RootArrowReader.java). Specifically, we need to override `loadNextBatch` in `ArrowReader` to manually populate `VectorSchemaRoot`. > > Regarding the performance comparison between Arrow Java's Parquet writing and the C++ implementation, do we have performance numbers? Just as discuss offline, that's hacky and will require us to implement a new arrow filesystem for the filesystems that not supported by arrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
