alamb commented on PR #6427: URL: https://github.com/apache/arrow-rs/pull/6427#issuecomment-2494091305
> I wonder if kernels are blindly concatenating identical buffers together, instead of using something like Buffer::ptr_eq to avoid a new entry for the exact same buffer allocation? What was happening in DataFusion was we had a Filter --> Coalesce chain and thus basically calling `concat` a few thousand different input `RecordBatch` each with a few rows. However, to your point, it may well be the case that the input RecordBatches shared the same underlying buffer so maybe the same buffer was being appended multiple times @XiangpengHao do you remember if you looked for this ? (related to "Section 3.5: Buffer size tuning " in https://www.influxdata.com/blog/faster-queries-with-stringview-part-two-influxdb/) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
