pitrou commented on PR #46730:
URL: https://github.com/apache/arrow/pull/46730#issuecomment-3022421404

   > 1- API and Handling of the Last Buffer In [this pull 
request](https://github.com/apache/arrow/pull/46655), I demonstrated that it’s 
possible to [share 
buffers](https://github.com/apache/arrow/blob/a5dfadba3626c082235d9ea22db6f2cb22398d9a/cpp/src/arrow/array/builder_binary.cc#L90)
 without copying or finalizing the last buffer. This avoids [relocating the 
buffer](https://github.com/apache/arrow/blob/ed13cedd8bf7ddc06db152f97e68d86c2c37e949/cpp/src/arrow/array/builder_binary.h#L563)
 to remove blank space, which can be a costly operation when the unused space 
exceeds 64 bytes.
   > 
   > 2-
   > 
   > > Is it a win, though? If most Parquet strings are <= 12 bytes we would 
pointlessly waste space and CPU time.
   > 
   > In [this pull request](https://github.com/apache/arrow/pull/46229), I 
proposed a method that could help avoid memory bloat when buffers are shared. 
Additionally, in [this issue](https://github.com/apache/arrow/issues/45639), I 
think this metadata could help determine when CompactArray should be called.
   
   Thanks for the reminder, and sorry that this is taking a long time :) I 
propose that we review these PRs one by one. I've started with the 
`CompactArray` one and, once that is done, I would like to then move to the 
`AppendArraySlice` improvement.
   
   This PR here is slightly more contentious so I think we should tackle it 
only after the other APIs have settled semantics.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to