Re: [I] New row format that does not guarantee ordering of columns [arrow-rs]

via GitHub Sat, 17 Jan 2026 09:39:44 -0800


rluvaton commented on issue #9083:
URL: https://github.com/apache/arrow-rs/issues/9083#issuecomment-3764140243


   > > @alamb I'm talking about the shuffling/partitioning operation - Consider 
you have a 10 batches with 50 columns, and you want to split to 3 partitions 
based on a round robin or hash, so each row will be needed to copy to a 
different place.
   > 
   > If I were implementing this operation I would:
   > 1. Compute the input indices to send to each partition
   > 2. Call the 
[take](https://docs.rs/arrow/latest/arrow/compute/kernels/take/fn.take_record_batch.html)
 kernel to create the relevant output arrays
   > 
   > This would result in exactly one data copy (to the output array)
   
   
   But what if you need to partition to 1500 partitions?
   If batch size is 8192 you will have for each partition ~6 elements.
   
   So for each partition I will do take on 6 elements and it will output an 
array of 6 elements per partition.
   
   And every item will be a cache miss if the data is evenly distributed and 
you don't take advantage of the data already in the cache
   
   And if you have 30 columns (or even 500 like I saw sometimes) then you will 
have 1500 * 30 cache misses per batch 
   
   And if want the partition to output batches with the configured batch size 
8192 I will to concat again so another copy.
   
   And if you say, ok, so if you had the feature for take to builders (like 
some open issue about that)
   
   Then I will say that you still have the cache misses.
   
   But you will say, you have cache misses between elements as well
   
   yes but I have 1 source and 1 dest so it is more predictable and one of the 
optimization is to encode multiple columns at a time.
   
   But then you say, ok so instead don't use take and go over that array and 
for each index copy it to the destination builder for that partition.
   
   And this was my current approach before this. But you are doing small data 
copies for each row. And with row format you doing larger data copies (entire 
row)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] New row format that does not guarantee ordering of columns [arrow-rs]

Reply via email to