comphead commented on code in PR #15840:
URL: https://github.com/apache/datafusion/pull/15840#discussion_r2066843997
##########
datafusion/datasource-avro/src/avro_to_arrow/reader.rs:
##########
@@ -133,19 +133,37 @@ impl<R: Read> Reader<'_, R> {
///
/// If reading a `File`, you can customise the Reader, such as to enable
schema
/// inference, use `ReaderBuilder`.
+ ///
+ /// If projection is provided, it uses a schema with only the fields in
the projection, respecting their order.
+ /// Only the first level of projection is handled. No further projection
currently occurs, but would be
+ /// useful if plucking values from a struct, e.g. getting `a.b.c.e` from
`a.b.c.{d, e}`.
pub fn try_new(
reader: R,
schema: SchemaRef,
batch_size: usize,
projection: Option<Vec<String>>,
) -> Result<Self> {
+ let projected_schema = if let Some(proj) = &projection {
+ if !proj.is_empty() {
+ let projected_fields: Vec<arrow::datatypes::Field> = proj
+ .iter()
+ .filter_map(|name| schema.column_with_name(name))
+ .map(|(_, field)| field.clone())
+ .collect();
+ Arc::new(arrow::datatypes::Schema::new(projected_fields))
+ } else {
+ Arc::clone(&schema)
+ }
+ } else {
+ Arc::clone(&schema)
+ };
Review Comment:
```suggestion
let projected_schema = projection.as_ref().filter(|p|
!p.is_empty()).map_or_else(
|| Arc::clone(&schema),
|proj| {
Arc::new(arrow::datatypes::Schema::new(
proj.iter()
.filter_map(|name| schema.column_with_name(name).map(|(_,
f)| f.clone()))
.collect(),
))
},
);
```
would be that more concise? I didn't check the snippet just the idea
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]