lidavidm commented on issue #36443: URL: https://github.com/apache/arrow/issues/36443#issuecomment-1620263703
Is it possible to share a self-contained reproduction? That said, I think what _might_ be happening is that the Parquet writer may request more than one batch from the reader, and if you request to share roots, then the previous batch will be overwritten. That is, I would expect this to fail: ```python reader = wrap_from_java_stream(...) batch1 = reader.read_next_batch() batch1.validate(full=True) # OK batch2 = reader.read_next_batch() batch1.validate(full=True) # Not OK because batch2 and batch1 share the same allocation ``` Exporting data via C Data does not copy the data, so it is your application's responsibility to properly manage the lifetime of the buffers. And Arrow Java uses mutable buffers, so if you enable reusing a VectorSchemaRoot, you'll find that reading new data invalidates previously read data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
