nevi-me commented on issue #1750:
URL: https://github.com/apache/arrow-rs/issues/1750#issuecomment-1153269508

   I've read this thread a few times, but I'm still hazy on what a good 
approach is, given how out of the loop I have been for so long. We had 
discussed with Jorge many moons ago that passing the offset and length to 
Buffer and Bitmap would be a good solution (as is done in arrow2 like you 
mention @tustvold).
   
   I haven't written arrow code in very long, so I can't quite remember the 
details. However, what I recall was having a challenge figuring out what 
happens in the below scenario.
   
   An array is of type `struct[a]<struct[b]<struct[c]<struct[d]<int32[e]>>>>` 
and we slice it, what happens when we select `a.b.c`?
   
   The trouble was that if we don't pass down the offset and length to the 
`ArrayData` of `a`'s children, we'd be bound to always knowing `a`'s offset, 
which forces us to compute it each time we access `a` or any of its children.
   
   So in principle I favoured pushing down the offset at the time. Which I 
suppose has led us here:
   
   > ArrayData::Slice contains a special case for StructArray where it recurses 
the offset into its children. However, it preserves the offset on the parent 
ArrayData, in order for the validity buffer to work correctly.
   
   ___
   
   >There are longer term suggestions around handling offsets in ArrayData 
differently, but until then I would like to propose:
   > * Remove the ArrayData::slice special-case
   > * Slice child data within StructArray when constructing boxed_fields
   
   This makes sense to implement as a solution (interim?), but yea perhaps 
first-prize would be propagating offsets and value lengths to a redesigned 
Buffer and Bitmap
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to