alamb opened a new issue, #4958: URL: https://github.com/apache/arrow-rs/issues/4958
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When trying to divide up data in `RecordBatches` into subsets such as when writing partitioned output in DataFusion or in lancedb, we end up creating a new `RecordBatch` that is a subset of the input eventually using the underlying arrow [take](https://docs.rs/arrow-select/latest/arrow_select/take/fn.take.html) kernel See slack discussion in https://the-asf.slack.com/archives/C01QUFS30TD/p1696014226063749 **Describe the solution you'd like** Similarly to [`RecordBatch::slice`](https://docs.rs/arrow/latest/arrow/array/struct.RecordBatch.html#method.slice) I would like an ergnomic version of [`RecordBatch::take`] that can be used like ```rust let batch: RecordBatch = ...; let indices = vec![1,3,5]; // new_batch contains rows 1, 3, and 5 of `batch` let new_batch = batch.take(&indices)?; ``` We can probably port the code from DataFuson here (thanks @devinjdangelo for the link) https://github.com/apache/arrow-datafusion/blob/85f3578f5fb47d28a8bc3a7b9be0284b3ced0fcd/datafusion/physical-plan/src/repartition/mod.rs#L193-L212 **Describe alternatives you've considered** If we worry about RecordBatch getting to complicated, we might want to consider adding `RecordBatchExt` trait that contains these methods, but that might be overly complicated **Additional context** cc @devinjdangelo @westonpace -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
