alamb commented on code in PR #13986:
URL: https://github.com/apache/datafusion/pull/13986#discussion_r1901305754


##########
datafusion/core/src/physical_planner.rs:
##########
@@ -2006,6 +2001,45 @@ fn tuple_err<T, R>(value: (Result<T>, Result<R>)) -> 
Result<(T, R)> {
     }
 }
 
+#[derive(Default)]
+struct InvariantChecker;
+
+impl InvariantChecker {
+    /// Checks that the plan change is permitted, returning an Error if not.
+    ///
+    /// In debug mode, this recursively walks the entire physical plan and
+    /// performs additional checks using 
[`ExecutionPlan::check_node_invariants`].
+    pub fn check(
+        &mut self,
+        plan: &Arc<dyn ExecutionPlan>,
+        rule: &Arc<dyn PhysicalOptimizerRule + Send + Sync>,

Review Comment:
   I think for API design we should not pass the rule to the invariant checker 
(as the checker shouldn't logically depend on the rule). Perhaps just the rule 
name could be passed in to help with debug messages



##########
datafusion/physical-plan/src/execution_plan.rs:
##########
@@ -110,6 +110,16 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
     /// trait, which is implemented for all `ExecutionPlan`s.
     fn properties(&self) -> &PlanProperties;
 
+    /// Returns an error if this individual node does not conform to its 
invariants.

Review Comment:
   Perhaps to take into account the different types of "executableness" we can 
use a similar enum as we did for LogicalPlans: 
https://github.com/apache/datafusion/blob/264f4c51fc97981435f1a1827de934472d60edf8/datafusion/expr/src/logical_plan/invariants.rs#L31
   
   
   Then the signature might look like
   ```rust
       fn check_node_invariants(&self, invariant_level: InvariantLevel) -> 
Result<()> 
         Ok(())
       }
   ```



##########
datafusion/physical-plan/src/execution_plan.rs:
##########
@@ -110,6 +110,16 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
     /// trait, which is implemented for all `ExecutionPlan`s.
     fn properties(&self) -> &PlanProperties;
 
+    /// Returns an error if this individual node does not conform to its 
invariants.
+    /// These invariants are typically only checked in debug mode.
+    ///
+    /// A default set of invariants is provided in the default implementation.
+    /// Extension nodes can provide their own invariants.
+    fn check_node_invariants(&self) -> Result<()> {
+        // TODO

Review Comment:
   > Conceptually, sanity checking is a "more general" process -- it verifies 
that any two operators that exchange data (i.e. one's output feeds the other's 
input) are compatible. So I don't think we can "change" it to be an invariant 
checker, but we can extend it to also check "invariants" of each individual 
operator (however they are defined by an ExecutionPlan) as it traverses the 
plan tree.
   
   I agree with this sentiment. It seems to me that the "SanityChecker" is 
verifying invariants that should be true for all nodes (regardless of what they 
do -- for example that the declared required input sort is the same as the 
produced output sort)
   
   Thus, focusing on ExecutionPlan specific invariants might be a good first 
step.
   
   Some simple invariants to start with I could imagine are:
   1. Number of inputs (e.g. that unions have more than zero inputs, for 
example)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to