alamb commented on code in PR #5333:
URL: https://github.com/apache/arrow-rs/pull/5333#discussion_r1468434720


##########
arrow-select/src/take.rs:
##########
@@ -2054,4 +2105,48 @@ mod tests {
         let expected = vec![Some("a"), None, None, Some("a"), Some("c"), 
Some("d")];
         assert_eq!(expected, actual);
     }
+
+    #[test]
+    fn test_take_record_batch() {

Review Comment:
   Given there is now a doc example (which is tested as part of CI), I think 
this unit test is redundant so could be removed from the PR



##########
arrow-select/src/take.rs:
##########
@@ -769,11 +769,62 @@ to_indices_reinterpret!(Int32Type, UInt32Type);
 to_indices_identity!(UInt64Type);
 to_indices_reinterpret!(Int64Type, UInt64Type);
 
+/// Take rows by index from [`RecordBatch`] and returns a new [`RecordBatch`] 
from those indexes.
+///
+/// This function will call [`take`] on each array of the [`RecordBatch`] and 
assemble a new [`RecordBatch`].
+///
+/// # Example
+/// ```
+/// # use std::sync::Arc;
+/// # use arrow_array::{StringArray, Int32Array, UInt32Array, RecordBatch};
+/// # use arrow_schema::{DataType, Field, Schema};
+/// # use arrow_select::take::take_record_batch;
+///
+/// let schema = Arc::new(Schema::new(vec![
+///     Field::new("a", DataType::Int32, true),
+///     Field::new("b", DataType::Utf8, true),
+/// ]));
+/// let batch = RecordBatch::try_new(
+///     schema.clone(),
+///     vec![
+///         Arc::new(Int32Array::from_iter_values(0..20)),
+///         Arc::new(StringArray::from_iter_values(
+///             (0..20).map(|i| format!("str-{}", i)),
+///         )),
+///     ],
+/// )
+/// .unwrap();
+///
+/// let indices = UInt32Array::from(vec![1, 5, 10]);
+/// let taken = take_record_batch(&batch, &indices).unwrap();
+///
+/// let expected = RecordBatch::try_new(
+///     schema,
+///     vec![
+///         Arc::new(Int32Array::from(vec![1, 5, 10])),
+///         Arc::new(StringArray::from(vec!["str-1", "str-5", "str-10"])),
+///     ],
+/// )
+/// .unwrap();
+/// assert_eq!(taken, expected);
+/// ```
+pub fn take_record_batch<T: ArrowPrimitiveType>(
+    record_batch: &RecordBatch,
+    indices: &PrimitiveArray<T>,

Review Comment:
   I recommend changing this to take a `&dyn Array` to mirror the `take` kernel 
itself, https://docs.rs/arrow/latest/arrow/compute/kernels/take/fn.take.html
   
   I think all your examples should still work, but the code would be slightly 
more general
   
   We could also potential do this as a follow on PR
   
   Thank you @RinChanNOWWW 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to