alamb commented on PR #6427:
URL: https://github.com/apache/arrow-rs/pull/6427#issuecomment-2494091305

   > I wonder if kernels are blindly concatenating identical buffers together, 
instead of using something like Buffer::ptr_eq to avoid a new entry for the 
exact same buffer allocation?
   
   What was happening in DataFusion was we had a Filter --> Coalesce chain and 
thus basically calling `concat` a few thousand  different input `RecordBatch` 
each with a few rows.
   
   However, to your point, it may well be the case that the input RecordBatches 
shared the same underlying buffer so maybe the same buffer was being appended 
multiple times
   
   @XiangpengHao do you remember if you looked for this ? (related to "Section 
3.5: Buffer size tuning
   " in 
https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to