avantgardnerio commented on code in PR #5386:
URL: https://github.com/apache/arrow-datafusion/pull/5386#discussion_r1117622609


##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -721,23 +730,20 @@ fn build_predicate_expression(
     let (left, op, right) = match expr {
         Expr::BinaryExpr(BinaryExpr { left, op, right }) => (left, *op, right),
         Expr::IsNull(expr) => {
-            let expr = build_is_null_column_expr(expr, schema, 
required_columns)
+            return build_is_null_column_expr(expr, schema, required_columns)
                 .unwrap_or(unhandled);
-            return Ok(expr);

Review Comment:
   Oh, so this was _always_ returning Ok?



##########
datafusion/core/src/physical_plan/file_format/parquet/page_filter.rs:
##########
@@ -110,14 +110,16 @@ impl PagePruningPredicate {
     pub fn try_new(expr: &Expr, schema: SchemaRef) -> Result<Self> {
         let predicates = split_conjunction(expr)
             .into_iter()
-            .filter_map(|predicate| match predicate.to_columns() {
-                Ok(columns) if columns.len() == 1 => {
-                    match PruningPredicate::try_new(predicate.clone(), 
schema.clone()) {
-                        Ok(p) if !p.allways_true() => Some(Ok(p)),
-                        _ => None,
+            .filter_map(|predicate| {
+                match PruningPredicate::try_new(predicate.clone(), 
schema.clone()) {
+                    Ok(p)
+                        if (!p.allways_true())
+                            && (p.required_columns().n_columns() < 2) =>

Review Comment:
   This is a behavior change for `n_columns() == 0`. Based on:
   
   ```
       pub fn allways_true(&self) -> bool {
           self.predicate_expr
               .as_any()
               .downcast_ref::<Literal>()
               .map(|l| matches!(l.value(), ScalarValue::Boolean(Some(true))))
               .unwrap_or_default()
       }
   ```
   
   I ran the test suite, panicing if `n_columns() == 0` and I can't get it to 
happen, so I guess it LGTM.
   I assume that would default to false, in which case I think we'd want to 
return a `None` here?



##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -258,6 +259,14 @@ impl RequiredStatColumns {
         Self::default()
     }
 
+    /// Returns number of unique columns.
+    pub(crate) fn n_columns(&self) -> usize {
+        self.iter()
+            .map(|(c, _s, _f)| c)

Review Comment:
   More descriptive variable names would help readability here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to