tokoko commented on code in PR #13127: URL: https://github.com/apache/datafusion/pull/13127#discussion_r1819686883
########## datafusion/substrait/src/logical_plan/consumer.rs: ########## @@ -566,36 +568,57 @@ pub async fn from_substrait_rel( let mut input = LogicalPlanBuilder::from( from_substrait_rel(ctx, input, extensions).await?, ); - let mut names: HashSet<String> = HashSet::new(); - let mut exprs: Vec<Expr> = vec![]; - for e in &p.expressions { - let x = - from_substrait_rex(ctx, e, input.clone().schema(), extensions) + + let mut default_exprs: Vec<Expr> = vec![]; + + // By default, a Substrait Project emits all inputs fields + let input_schema = input.schema(); + for index in 0..(input_schema.fields().len()) { + let e = + Expr::Column(Column::from(input_schema.qualified_field(index))); + default_exprs.push(e); + } + // followed by all expressions + for expr in &p.expressions { + let e = + from_substrait_rex(ctx, expr, input.clone().schema(), extensions) .await?; // if the expression is WindowFunction, wrap in a Window relation - if let Expr::WindowFunction(_) = &x { + if let Expr::WindowFunction(_) = &e { // Adding the same expression here and in the project below // works because the project's builder uses columnize_expr(..) // to transform it into a column reference - input = input.window(vec![x.clone()])? + input = input.window(vec![e.clone()])? } - // Ensure the expression has a unique display name, so that project's - // validate_unique_names doesn't fail - let name = x.schema_name().to_string(); - let mut new_name = name.clone(); - let mut i = 0; - while names.contains(&new_name) { - new_name = format!("{}__temp__{}", name, i); - i += 1; + default_exprs.push(e); + } + + // Ensure that all expressions have a unique display name, so that + // validate_unique_names does not fail when constructing the project. + let mut name_tracker = NameTracker::new(); + + let mut final_exprs: Vec<Expr> = vec![]; + match retrieve_emit_kind(p.common.as_ref()) { Review Comment: Got it, I suppose double projections everywhere can get ugly pretty quickly. > I agree as well, but we also don't want to call the optimizer as part of consuming Substrait sure, I meant an optimizer pass after consumer spits out the plan, not as part of a consumer. Another thing that came to mind just now.. might be a stretch, but bear with me here :laughing: Can't this lead to correctness issues if emit maps the same nondeterministic projection expression to multiple output fields? take random() function for example.. DF would have no way of knowing the original substrait plan asked for a single random value emitted twice rather than two different random values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org