wiedld commented on code in PR #13651:
URL: https://github.com/apache/datafusion/pull/13651#discussion_r1886199978


##########
datafusion/optimizer/src/analyzer/mod.rs:
##########
@@ -164,18 +172,38 @@ impl Analyzer {
             log_plan(rule.name(), &new_plan);
             observer(&new_plan, rule.as_ref());
         }
-        // for easier display in explain output
-        check_plan(&new_plan).map_err(|e| {
+
+        // verify at the end, after the last LP analyzer pass.
+        assert_valid_semantic_plan(&new_plan).map_err(|e| {
             DataFusionError::Context("check_analyzed_plan".to_string(), 
Box::new(e))
         })?;
+
         log_plan("Final analyzed plan", &new_plan);
         debug!("Analyzer took {} ms", start_time.elapsed().as_millis());
         Ok(new_plan)
     }
 }
 
-/// Do necessary check and fail the invalid plan
-fn check_plan(plan: &LogicalPlan) -> Result<()> {
+/// These are invariants which should hold true before and after each analyzer.
+///
+/// This differs from [`LogicalPlan::assert_invariants`], which addresses if a 
singular
+/// LogicalPlan is valid. Instead this address if the analyzer (before and 
after)
+/// is valid based upon permitted changes.
+///
+/// Does not check elements which are mutated by the analyzers (e.g. the 
schema).
+fn assert_valid_semantic_plan(plan: &LogicalPlan) -> Result<()> {
+    plan.assert_invariants()?;
+
+    // TODO: should this be moved to LogicalPlan::assert_invariants?
+    assert_subqueries_are_valid(plan)?;

Review Comment:
   The `assert_subqueries_are_valid()` runs [these 
checks](https://github.com/apache/datafusion/blob/bd2c975df88c9e95f2f6b5f2abed5638904e2ac0/datafusion/optimizer/src/analyzer/subquery.rs#L28-L34).
 The existing (on main) code only performs these checks during after all of the 
analyzer passes have completed. If I run these checks before the first analyzer 
pass => we get test failures.
   
   My next steps will be to figure out which of these test failures are due to 
code bugs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to