jorgecarleitao commented on pull request #8200:
URL: https://github.com/apache/arrow/pull/8200#issuecomment-706552752


   Thanks a lot for driving this and CCing. This is definitely important.
   
   I have myself hit that `capacity` problem multiple times! One was when I was 
trying to simplify the `equal.rs`, for what you wrote on which buffers are 
different when the capacity is different, the second one was on #8401 , since 
when we have a buffer that receives data via the c data interface, we do not 
even know (or care) about its capacity.
   
   AFAI can tell, we have two use-cases of `capacity` atm:
   
   * deallocating the region when it is no longer used
   * computing the total size in bytes of arrays
   
   Because arrays share buffers, the total size of an array is currently 
misleading. For example, when the array is computed from `is_not_null` of 
another array, both the null buffer (buffer 0) and the `value` (buffer 1) share 
the same memory region, and thus IMO the total size computation based on 
`capacity` is incorrect. This is also true for complex structs on which buffers 
are shared within the same array.
   
   Thus, IMO `capacity` main use-case is for bookkeeping, on how to de-allocate 
the region. #8401 systematizes that idea, on which `BufferData` (renamed 
`Bytes` there) no longer has `capacity`, but an `enum` about how to de-allocate 
itself.
   
   Regardless, I would say that the vast majority of the use-cases on which we 
want to compare buffers is when we want to compare its contents, irrespectively 
of how they should be deallocated / capacity. Therefore, I would be happy to 
have buffer comparison be based on their actual content (in bytes). We just 
need to be careful about bitmaps, on which the comparison should be made in 
bits.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to