jorgecarleitao commented on pull request #9211:
URL: https://github.com/apache/arrow/pull/9211#issuecomment-768798901


   Thanks @nevi-me . IMO the idea is good, but I think that in rust's notation 
that implementation will be unsound.
   
   `Buffer::offset` is measured in `bytes`, but `ArrayData::offset` is measured 
in slots. So, slicing a buffer in slots will lead to an unalligned buffer. E.g. 
a buffer representing N `u32` has 4 bytes per slot, and doing 
`ArrayData::slice(1, 1)` would cause that the buffer to contain `4*N - 1` bytes.
   
   In the particular case of `ListArray`, I think that we should only offset 
the `ArrayData` and not the child array: the offset buffer 
(`ArrayData::buffers[0]`) will have all the information we need to extract the 
correct items from the child array. Of course we must use it to access the 
items, but imo we already do that on `ListArray::value_offset`. We may not be 
doing that in the equality, though.
   
   In the case of `StructArray`, we have two options: increase the offset of 
the child by an equal amount and only support `StructArray` with 
`ArrayData::offset = 0`, or increase `ArrayData::offset` and change the 
equality code to take that into account.
   
   In general, the child data's `ArrayData` is insufficient to use it. Either 
because the parent has a non-`None` null buffer, or because the parent has an 
`offset`. So, AFAI understand, we will always need to use the parent's 
accessors to interact with child objects.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to