alamb commented on code in PR #5132: URL: https://github.com/apache/arrow-datafusion/pull/5132#discussion_r1100745375
########## datafusion/expr/src/logical_plan/builder.rs: ########## @@ -403,6 +403,28 @@ impl LogicalPlanBuilder { Ok(()) })?; + // if current plan is distinct or current plan is repartition and its child plan is distinct, + // then this plan is a select distinct plan + let is_select_distinct = match self.plan { + LogicalPlan::Distinct(_) => true, + LogicalPlan::Repartition(Repartition { ref input, .. }) => { + matches!(input.as_ref(), &LogicalPlan::Distinct(_)) + } + _ => false, Review Comment: 🤔 that is a good point. If the plan is being created from SQL I think this is the pattern. However, if it is built directly (via a `DataFrame` or someone using `LogicalPlanBuilder` directly) it is probably incorrect to throw an error here (as they may have meant to to run the distinct before the sort for some reason) Maybe we should move the check to the SQL planner -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org