thisisnic commented on issue #46178:
URL: https://github.com/apache/arrow/issues/46178#issuecomment-2834316472

   Hey, I'm experimenting with using LLMs for debugging, and I'm going to 
summarise what I found here from inputting some code and docs files into 
chatGPT.  I'm going to go slow on this as I don't want to waste folks' time 
looking into inaccurate solutions, so if this is nonsense, I'll stop with this 
approach in areas of the codebase I'm not familiar with.
   
   > The problem is that in TransferZeroCopy, the Buffers are created directly 
from memory owned by the Parquet page reader. When the RecordReader or file 
reader is destroyed, the memory backing those Buffers can disappear - but the 
Arrow Array still assumes the Buffers stay valid.
   >
   > Arrow expects Buffers to own or strongly reference their memory. It 
doesn't track when memory can be deallocated; it trusts that if a Buffer 
exists, its memory is alive.
   >
   > Fix:
   > When creating Buffers from RecordReader memory, the code should attach a 
shared_ptr back to the owner (e.g., the page reader or file reader) so that the 
memory stays alive for as long as the Buffer does.
   > Otherwise, downstream operations like Take, filter, or materializing lazy 
Arrays can cause use-after-free bugs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to