[GitHub] [arrow] hu6360567 commented on issue #36443: [Java][Python] export ArrowVectorIterator to python fails randomly, if reuseVectorSchemaRoot enabled

via GitHub Tue, 04 Jul 2023 07:04:20 -0700


hu6360567 commented on issue #36443:
URL: https://github.com/apache/arrow/issues/36443#issuecomment-1620310122


   > That said, I think what might be happening is that the Parquet writer may 
request more than one batch from the reader, and if you request to share roots, 
then the previous batch will be overwritten. That is, I would expect this to 
fail:
   
   That explains the failure in my code. Since Arrow Java prefers use the same 
root to populate data, and so as it in default `ArrowReader` implementation, 
the base class to export to C Data. Like 
["InMemoeryArrowReader"](https://github.com/apache/arrow/blob/e7d5028d18c18330280f7ff97337c753cfd9ce71/java/c/src/test/java/org/apache/arrow/c/StreamTest.java#L273),
 it should fail in such scenario.
   
   Do we have any document to explain why java implementation perfers the same 
root, also why C Data Stream is bound to `ArrowReader`, rather than 
`Iterator<VectorSchemaRoot>`. 
   The ArrowReader default implementation always need to convert to RecordBatch 
once if only `loadNextBatch` overrided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] hu6360567 commented on issue #36443: [Java][Python] export ArrowVectorIterator to python fails randomly, if reuseVectorSchemaRoot enabled

Reply via email to