[GitHub] [arrow] lidavidm commented on issue #36443: [Java][Python] export ArrowVectorIterator to python fails randomly, if reuseVectorSchemaRoot enabled

via GitHub Tue, 04 Jul 2023 06:34:56 -0700


lidavidm commented on issue #36443:
URL: https://github.com/apache/arrow/issues/36443#issuecomment-1620263703


   Is it possible to share a self-contained reproduction?
   
   That said, I think what _might_ be happening is that the Parquet writer may 
request more than one batch from the reader, and if you request to share roots, 
then the previous batch will be overwritten. That is, I would expect this to 
fail:
   
   ```python
   reader = wrap_from_java_stream(...)
   batch1 = reader.read_next_batch()
   batch1.validate(full=True)  # OK
   batch2 = reader.read_next_batch()
   batch1.validate(full=True)  # Not OK because batch2 and batch1 share the 
same allocation
   ```
   
   Exporting data via C Data does not copy the data, so it is your 
application's responsibility to properly manage the lifetime of the buffers. 
And Arrow Java uses mutable buffers, so if you enable reusing a 
VectorSchemaRoot, you'll find that reading new data invalidates previously read 
data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] lidavidm commented on issue #36443: [Java][Python] export ArrowVectorIterator to python fails randomly, if reuseVectorSchemaRoot enabled

Reply via email to