vbarua commented on code in PR #12800:
URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792548576


##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -327,11 +325,10 @@ pub async fn from_substrait_extended_expr(
     })
 }
 
-/// parse projection
-pub fn extract_projection(
-    t: LogicalPlan,
+pub fn apply_projection(

Review Comment:
   minor: if we're going to rename this, I might would `apply_masking` as that 
better corresponds to the fact that this is applying MaskExpressions 
specifically, and also makes it clearer that this isn't related to creating 
`Projects` / `Projections`.



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -781,10 +757,11 @@ pub async fn from_substrait_rel(
                     from_substrait_named_struct(named_struct, extensions)?
                         .replace_qualifier(table_reference.clone());
 
+                let substrait_schema =
+                    apply_projection(substrait_schema, &read.projection)?;

Review Comment:
   Looking to future Substrait compatabilty. Substrait ReadRels have a 
[filter](https://github.com/substrait-io/substrait/blob/a0388ff69a83aa4addf51236b77bf275b3647590/proto/substrait/algebra.proto#L69)
 field that represents filter expressions that are applied _after_ the read but 
_before_ the masking (i.e. the filter expression can reference fields that are 
later masked out). 
   
   Is the solution you're implementing here going to be compatible with that? 
By applying the masking here, we might potentially mask out fields that we're 
going to need.



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -340,41 +337,20 @@ pub fn extract_projection(
                     .iter()
                     .map(|item| item.field as usize)
                     .collect();
-                match t {
-                    LogicalPlan::TableScan(mut scan) => {
-                        let fields = column_indices
-                            .iter()
-                            .map(|i| scan.projected_schema.qualified_field(*i))
-                            .map(|(qualifier, field)| {
-                                (qualifier.cloned(), Arc::new(field.clone()))
-                            })
-                            .collect();
-                        scan.projection = Some(column_indices);
-                        scan.projected_schema = DFSchemaRef::new(
-                            DFSchema::new_with_metadata(fields, 
HashMap::new())?,
-                        );
-                        Ok(LogicalPlan::TableScan(scan))
-                    }
-                    LogicalPlan::Projection(projection) => {
-                        // create another Projection around the Projection to 
handle the field masking
-                        let fields: Vec<Expr> = column_indices
-                            .into_iter()
-                            .map(|i| {
-                                let (qualifier, field) =
-                                    projection.schema.qualified_field(i);
-                                let column =
-                                    Column::new(qualifier.cloned(), 
field.name());
-                                Expr::Column(column)
-                            })
-                            .collect();
-                        project(LogicalPlan::Projection(projection), fields)
-                    }
-                    _ => plan_err!("unexpected plan for table"),
-                }
+
+                let fields = column_indices
+                    .iter()
+                    .map(|i| schema.qualified_field(*i))
+                    .map(|(qualifier, field)| {
+                        (qualifier.cloned(), Arc::new(field.clone()))
+                    })
+                    .collect();
+
+                Ok(DFSchema::new_with_metadata(fields, HashMap::new())?)

Review Comment:
   +1 to retaining metadata if it's possible, though I realize the existing 
code doesn't do this now either.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to