wiedld commented on code in PR #13651: URL: https://github.com/apache/datafusion/pull/13651#discussion_r1886199978
########## datafusion/optimizer/src/analyzer/mod.rs: ########## @@ -164,18 +172,38 @@ impl Analyzer { log_plan(rule.name(), &new_plan); observer(&new_plan, rule.as_ref()); } - // for easier display in explain output - check_plan(&new_plan).map_err(|e| { + + // verify at the end, after the last LP analyzer pass. + assert_valid_semantic_plan(&new_plan).map_err(|e| { DataFusionError::Context("check_analyzed_plan".to_string(), Box::new(e)) })?; + log_plan("Final analyzed plan", &new_plan); debug!("Analyzer took {} ms", start_time.elapsed().as_millis()); Ok(new_plan) } } -/// Do necessary check and fail the invalid plan -fn check_plan(plan: &LogicalPlan) -> Result<()> { +/// These are invariants which should hold true before and after each analyzer. +/// +/// This differs from [`LogicalPlan::assert_invariants`], which addresses if a singular +/// LogicalPlan is valid. Instead this address if the analyzer (before and after) +/// is valid based upon permitted changes. +/// +/// Does not check elements which are mutated by the analyzers (e.g. the schema). +fn assert_valid_semantic_plan(plan: &LogicalPlan) -> Result<()> { + plan.assert_invariants()?; + + // TODO: should this be moved to LogicalPlan::assert_invariants? + assert_subqueries_are_valid(plan)?; Review Comment: The `assert_subqueries_are_valid()` runs [these checks](https://github.com/apache/datafusion/blob/bd2c975df88c9e95f2f6b5f2abed5638904e2ac0/datafusion/optimizer/src/analyzer/subquery.rs#L28-L34). The existing (on main) code only performs these checks during after all of the analyzer passes have completed. If I run these checks before the first analyzer pass => we get test failures. My next steps will be to figure out which of these test failures are due to code bugs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org