nevi-me commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-706553767


   > Thus, IMO `capacity` main use-case is for bookkeeping, on how to 
de-allocate the region. #8401 systematizes that idea, on which `BufferData` 
(renamed `Bytes` there) no longer has `capacity`, but an `enum` about how to 
de-allocate itself.
   
   I agree with your thinking here, especially on equality and arrays with 
offsets. One of the things we don't test (extensively or at all) is whether 
we're able to write arrays with unaligned offsets to the IPC format. I'll put 
in time to work on this whenever I get to v5 of the Arrow format.
   
   > Regardless, I would say that the vast majority of the use-cases on which 
we want to compare buffers is when we want to compare its contents, 
irrespectively of how they should be deallocated / capacity. Therefore, I would 
be happy to have buffer comparison be based on their actual content (in bytes). 
We just need to be careful about bitmaps, on which the comparison should be 
made in bits.
   
   I'd also prefer not to compare using buffer capacity, as I'm mainly 
interested in the buffer contents, especially when dealing with a frozen buffer 
where we won't be modifying contents.
   
   On bitmaps, I'll have a look, but I think that the approach of using the bit 
iterators should cover this.
   
   Apologies again for not having gotten to #8401 yet, I hope you're not in a 
rush to get it implemented (esp as we're now looking at 3.0 for all new work).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to