alamb commented on a change in pull request #1033:
URL: https://github.com/apache/arrow-rs/pull/1033#discussion_r768132120



##########
File path: arrow/src/datatypes/schema.rs
##########
@@ -87,6 +87,18 @@ impl Schema {
         Self { fields, metadata }
     }
 
+
+    /// Returns a new schema with only the specified columns in the new schema
+    /// This carries metadata from the parent schema over as well
+    pub fn project(&self, indices: impl IntoIterator<Item=usize>) -> 
Result<Schema> {
+        let mut new_fields = vec![];
+        for i in indices {
+            let f = self.fields[i].clone();
+            new_fields.push(f);
+        }

Review comment:
       I think as written 
   1.  This will `panic!` if there the index is not in bounds:
   2.  is not "idiomatic rust style" (which to me means avoid `mut`). This is 
far less important
   
   How about something such as (untested):
   
   ```suggestion
           let new_fields = indices
             .into_iter()
             .map(|i| {
               self.fields.get(i).map(|f| f.clone()))
                 .ok_or_else(|| Err(ArrowError::SchemaError(
                   format!("project index {} out of bounds, max field {}"
                                       i, self.fields().len()),
                               ))
             })
             .collect::<Result<Vec<_>>>()?;
   ```
   
   Note the use of https://doc.rust-lang.org/std/vec/struct.Vec.html#method.get 
to avoid `fields[i]` and then the somewhat confusing use of turbofish 
`.collect::<Result<Vec<_>>()` -- it took me quite a while to get used to that 
pattern 
   
   

##########
File path: arrow/src/datatypes/schema.rs
##########
@@ -369,4 +381,23 @@ mod tests {
 
         assert_eq!(schema, de_schema);
     }
+
+    #[test]
+    fn test_project() {
+        let mut metadata = HashMap::new();
+        metadata.insert("meta".to_string(), "data".to_string());
+
+        let schema = Schema::new_with_metadata(vec![
+            Field::new("name", DataType::Utf8, false),
+            Field::new("address", DataType::Utf8, false),
+            Field::new("priority", DataType::UInt8, false),
+        ], metadata);
+
+        let projected: Schema = schema.project(vec![0, 2]).unwrap();
+
+        assert_eq!(projected.fields().len(), 2);
+        assert_eq!(projected.fields()[0].name(), "name");
+        assert_eq!(projected.fields()[1].name(), "priority");
+        assert_eq!(projected.metadata.get("meta").unwrap(), "data")
+    }

Review comment:
       Related to above -- I recommend a test for handling if index is out of 
bounds -- like `schema.project([2, 3])` 

##########
File path: arrow/src/record_batch.rs
##########
@@ -175,6 +175,12 @@ impl RecordBatch {
         self.schema.clone()
     }
 
+
+    /// Projects the schema onto the specified columns
+    pub fn project(&self, indices: impl IntoIterator<Item=usize>) -> 
Result<Schema> {

Review comment:
       The intent of this field was to project the `RecordBatch` rather than 
just the schema:
   
   A signature like this:
   ```suggestion
       pub fn project(&self, indices: impl IntoIterator<Item=usize>) -> 
Result<RecordBatch> {
   ```
   
   (so we would also have to project the columns as well as the schema)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to