hu6360567 commented on issue #36443: URL: https://github.com/apache/arrow/issues/36443#issuecomment-1620310122
> That said, I think what might be happening is that the Parquet writer may request more than one batch from the reader, and if you request to share roots, then the previous batch will be overwritten. That is, I would expect this to fail: That explains the failure in my code. Since Arrow Java prefers use the same root to populate data, and so as it in default `ArrowReader` implementation, the base class to export to C Data. Like ["InMemoeryArrowReader"](https://github.com/apache/arrow/blob/e7d5028d18c18330280f7ff97337c753cfd9ce71/java/c/src/test/java/org/apache/arrow/c/StreamTest.java#L273), it should fail in such scenario. Do we have any document to explain why java implementation perfers the same root, also why C Data Stream is bound to `ArrowReader`, rather than `Iterator<VectorSchemaRoot>`. The ArrowReader default implementation always need to convert to RecordBatch once if only `loadNextBatch` overrided. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
