[I] Add `RecordBatch::take` [arrow-rs]

via GitHub Thu, 19 Oct 2023 06:57:20 -0700


alamb opened a new issue, #4958:
URL: https://github.com/apache/arrow-rs/issues/4958


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When trying to divide up data in `RecordBatches` into subsets such as when 
writing partitioned output in DataFusion or in lancedb, we end up creating a 
new `RecordBatch` that is a subset of the input eventually using the underlying 
arrow 
[take](https://docs.rs/arrow-select/latest/arrow_select/take/fn.take.html) 
kernel
   
   See slack discussion in 
https://the-asf.slack.com/archives/C01QUFS30TD/p1696014226063749
   
   **Describe the solution you'd like**
   Similarly to 
[`RecordBatch::slice`](https://docs.rs/arrow/latest/arrow/array/struct.RecordBatch.html#method.slice)
 I would like an ergnomic version of  [`RecordBatch::take`] that can be used 
like
   
   ```rust
   let batch: RecordBatch = ...;
   let indices = vec![1,3,5];
   // new_batch contains rows 1, 3, and 5 of `batch`
   let new_batch = batch.take(&indices)?;
   ```
   
   We can probably port the code from DataFuson here (thanks @devinjdangelo  
for the link)
   
   
https://github.com/apache/arrow-datafusion/blob/85f3578f5fb47d28a8bc3a7b9be0284b3ced0fcd/datafusion/physical-plan/src/repartition/mod.rs#L193-L212
   
   **Describe alternatives you've considered**
   If we worry about RecordBatch getting to complicated, we might want to 
consider adding `RecordBatchExt` trait that contains these methods, but that 
might be overly complicated
   
   **Additional context**
   cc @devinjdangelo  @westonpace 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] Add `RecordBatch::take` [arrow-rs]

Reply via email to