jorgecarleitao commented on pull request #9211:
URL: https://github.com/apache/arrow/pull/9211#issuecomment-761758608


   @nevi-me , no need to apologize, no one saw this, and we also had no tests 
to cover this.
   
   Wrt to the offsets, I admit that I had a branch where I removed the 
`Buffer::offset` altogether. However, I saw some perf implications somewhere 
and abandoned the idea. We could just take the perf hit to remove this 
complexity, even though IMO the buffer offsets are not very painful to deal 
with, as they are opaque to the `ArrayData`: even bit operations automatically 
take that offset into account. The slot offset (`ArrayData`) is indeed more 
challenge, but it is part of the spec, so we need to deal with it.
   
   One aspect here is that unfortunately we have a lot of units of measurement:
   
   1. Offset measured in slots (`ArrayData::offset`)
   2. Offset measured in bytes (`Buffer::offset`, `ArrayData::offset * size_of`)
   3. Offset is sometimes measured in bits (for bitmap operations)
   
   Overall, this adds complexity to how to reason about the code. I am not sure 
we have an easy way to address this, as they are indeed needed. One idea is to 
declare each measurement to be a different type and perform explicit casts 
(which the compiler will optimize out as they area all represented by `usize`), 
so that we have the compiler on our side when adding different types of offsets 
(e.g. slot offset vs byte offset).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to