alamb commented on issue #8850:
URL: https://github.com/apache/arrow-rs/issues/8850#issuecomment-3543352085

   > I am creating new record batches from a stream of record batches. I need 
to ensure that identical values in a sorted column are always located within 
the same record batch.
   
   I am not sure what your usecase is, but without additional constraints this 
could result in an unbounded use of memory
   
   Another approach I have seen used  is to store the same data as a 
`Vec<RecordBatch>`, and call `slice()` to split batches where the sort key 
changes (so the batches line up nicely with the same keys)
   
   You can use the 
[partition](https://docs.rs/arrow/latest/arrow/compute/kernels/partition/index.html)
 kernel to find where the sorted values change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to