vbarua commented on code in PR #12800:
URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792548576
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -327,11 +325,10 @@ pub async fn from_substrait_extended_expr(
})
}
-/// parse projection
-pub fn extract_projection(
- t: LogicalPlan,
+pub fn apply_projection(
Review Comment:
minor: if we're going to rename this, I might would `apply_masking` as that
better corresponds to the fact that this is applying MaskExpressions
specifically, and also makes it clearer that this isn't related to creating
`Projects` / `Projections`.
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -781,10 +757,11 @@ pub async fn from_substrait_rel(
from_substrait_named_struct(named_struct, extensions)?
.replace_qualifier(table_reference.clone());
+ let substrait_schema =
+ apply_projection(substrait_schema, &read.projection)?;
Review Comment:
Looking to future Substrait compatabilty. Substrait ReadRels have a
[filter](https://github.com/substrait-io/substrait/blob/a0388ff69a83aa4addf51236b77bf275b3647590/proto/substrait/algebra.proto#L69)
field that represents filter expressions that are applied _after_ the read but
_before_ the masking (i.e. the filter expression can reference fields that are
later masked out).
Is the solution you're implementing here going to be compatible with that?
By applying the masking here, we might potentially mask out fields that we're
going to need.
##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -340,41 +337,20 @@ pub fn extract_projection(
.iter()
.map(|item| item.field as usize)
.collect();
- match t {
- LogicalPlan::TableScan(mut scan) => {
- let fields = column_indices
- .iter()
- .map(|i| scan.projected_schema.qualified_field(*i))
- .map(|(qualifier, field)| {
- (qualifier.cloned(), Arc::new(field.clone()))
- })
- .collect();
- scan.projection = Some(column_indices);
- scan.projected_schema = DFSchemaRef::new(
- DFSchema::new_with_metadata(fields,
HashMap::new())?,
- );
- Ok(LogicalPlan::TableScan(scan))
- }
- LogicalPlan::Projection(projection) => {
- // create another Projection around the Projection to
handle the field masking
- let fields: Vec<Expr> = column_indices
- .into_iter()
- .map(|i| {
- let (qualifier, field) =
- projection.schema.qualified_field(i);
- let column =
- Column::new(qualifier.cloned(),
field.name());
- Expr::Column(column)
- })
- .collect();
- project(LogicalPlan::Projection(projection), fields)
- }
- _ => plan_err!("unexpected plan for table"),
- }
+
+ let fields = column_indices
+ .iter()
+ .map(|i| schema.qualified_field(*i))
+ .map(|(qualifier, field)| {
+ (qualifier.cloned(), Arc::new(field.clone()))
+ })
+ .collect();
+
+ Ok(DFSchema::new_with_metadata(fields, HashMap::new())?)
Review Comment:
+1 to retaining metadata if it's possible, though I realize the existing
code doesn't do this now either.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]