jonahgao commented on code in PR #6799:
URL: https://github.com/apache/arrow-datafusion/pull/6799#discussion_r1247692275


##########
datafusion/expr/src/expr_schema.rs:
##########
@@ -173,8 +173,30 @@ impl ExprSchemable for Expr {
             Expr::Alias(expr, _)
             | Expr::Not(expr)
             | Expr::Negative(expr)
-            | Expr::Sort(Sort { expr, .. })
-            | Expr::InList(InList { expr, .. }) => expr.nullable(input_schema),
+            | Expr::Sort(Sort { expr, .. }) => expr.nullable(input_schema),
+
+            Expr::InList(InList { expr, list, .. }) => {
+                // Avoid inspecting too many expressions.
+                const MAX_INSPECT_LIMIT: usize = 6;

Review Comment:
   > I want to know why use this number? Is it the practice from other systems?
   
   @jackwener No. The `nullable` function may be called multiple times during 
the optimization phase, 
   So I think adding a limitation would be preferable in order to prevent it 
from being excessively slow.
   But I'm not quite sure what would be an appropriate number.
   
      
   
   > spark handle it simplified.
   > 
   > ```scala
   > override def nullable: Boolean = children. Exists(_.nullable)
   > ```
   This seems to be a cache style.
   We can implement this by precomputing nullable in the `InList::new()` 
function.
   But the disadvantage are:
   - We need a new field for the `InList` struct
   - The precomputed nullable may not be used.
   
   @jackwener  Which solution do you prefer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to