lidavidm commented on issue #36443:
URL: https://github.com/apache/arrow/issues/36443#issuecomment-1620263703

   Is it possible to share a self-contained reproduction?
   
   That said, I think what _might_ be happening is that the Parquet writer may 
request more than one batch from the reader, and if you request to share roots, 
then the previous batch will be overwritten. That is, I would expect this to 
fail:
   
   ```python
   reader = wrap_from_java_stream(...)
   batch1 = reader.read_next_batch()
   batch1.validate(full=True)  # OK
   batch2 = reader.read_next_batch()
   batch1.validate(full=True)  # Not OK because batch2 and batch1 share the 
same allocation
   ```
   
   Exporting data via C Data does not copy the data, so it is your 
application's responsibility to properly manage the lifetime of the buffers. 
And Arrow Java uses mutable buffers, so if you enable reusing a 
VectorSchemaRoot, you'll find that reading new data invalidates previously read 
data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to