comphead commented on code in PR #15840:
URL: https://github.com/apache/datafusion/pull/15840#discussion_r2066843997


##########
datafusion/datasource-avro/src/avro_to_arrow/reader.rs:
##########
@@ -133,19 +133,37 @@ impl<R: Read> Reader<'_, R> {
     ///
     /// If reading a `File`, you can customise the Reader, such as to enable 
schema
     /// inference, use `ReaderBuilder`.
+    ///
+    /// If projection is provided, it uses a schema with only the fields in 
the projection, respecting their order.
+    /// Only the first level of projection is handled. No further projection 
currently occurs, but would be
+    /// useful if plucking values from a struct, e.g. getting `a.b.c.e` from 
`a.b.c.{d, e}`.
     pub fn try_new(
         reader: R,
         schema: SchemaRef,
         batch_size: usize,
         projection: Option<Vec<String>>,
     ) -> Result<Self> {
+        let projected_schema = if let Some(proj) = &projection {
+            if !proj.is_empty() {
+                let projected_fields: Vec<arrow::datatypes::Field> = proj
+                    .iter()
+                    .filter_map(|name| schema.column_with_name(name))
+                    .map(|(_, field)| field.clone())
+                    .collect();
+                Arc::new(arrow::datatypes::Schema::new(projected_fields))
+            } else {
+                Arc::clone(&schema)
+            }
+        } else {
+            Arc::clone(&schema)
+        };

Review Comment:
   ```suggestion
           let projected_schema = projection.as_ref().filter(|p| 
!p.is_empty()).map_or_else(
       || Arc::clone(&schema),
       |proj| {
           Arc::new(arrow::datatypes::Schema::new(
               proj.iter()
                   .filter_map(|name| schema.column_with_name(name).map(|(_, 
f)| f.clone()))
                   .collect(),
           ))
       },
   );
   ```
   
   would be that more concise? I didn't check the snippet just the idea



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to