jorgecarleitao commented on pull request #8287:
URL: https://github.com/apache/arrow/pull/8287#issuecomment-701172108


   To set expectations right, IMO this is a very difficult task.
   
   IMO rhere are at the moment 3 issues:
   
   ### buffer slices aka buffer `offset` aka `parent_` buffer
   
   Rust and C++ use slightly different approaches to slicing buffers.
   
   * In C++, we assign a `parent_` buffer whenever we slice a buffer.
   * In Rust, the raw data is known as a `BufferData`, and a buffer is composed 
by a `BufferData` and an offset (into the data)
   
   In both cases, memory management is tricky. If we slice a buffer from C++ 
and export it, does it knows that it cannot release the content?
   
   Specifically:
   1. create a buffer `A` in C++
   2. slice it into buffer `B` (which now has a `parent -> A`)
   3. export `B` to Rust
   4. Rust calls `B->release`
   5. C++ access the contents (via `A`) on a region that overlaps with `B` (UB?)
   
   I was unable to find any reference to the buffer's `parent` in 
[bridge.cc](https://github.com/apache/arrow/blob/master/cpp/src/arrow/c/bridge.cc),
 nor any shared pointer to the sliced region.
   
   I am asking because we have an analogous problem in Rust, but in Rust we use 
a shared point to memory region (BufferData), which I think protect us from 
this behavior. Specifically, a Buffer is rust is composed by:
   * an Arc to the actual region
   * an offset of where to start from in that region (non-zero in slicing)
   
   ### Rust's implementation of Dictionary arrays
   
   I think that Rust's implementation of dictionaries is difficult to bridge 
with C, as it assumes dictionary data is owned by a struct that is not 
`ArrayData`. I think that we will need to address this first. I raised this in 
ARROW-10128
   
   ### Threading
   
   How do we handle threads? We mutex the release?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to