clintropolis commented on a change in pull request #8047: optimize single input 
column multi-value expressions
URL: https://github.com/apache/incubator-druid/pull/8047#discussion_r309502912
 
 

 ##########
 File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
 ##########
 @@ -248,6 +256,24 @@ public BindingDetails(
       return 
arrayVariables.stream().map(IdentifierExpr::getIdentifier).collect(Collectors.toSet());
     }
 
+    /**
+     * Returns true if the expression has any array inputs. Note that in some 
cases, this can be true and
+     * {@link BindingDetails#getArrayVariables} can be empty, since the latter 
can only shallowly collect identifiers
+     * that are explicitly used as arrays.
 
 Review comment:
   The distinction is that `getArrayVariables` returns the identifiers we were 
able to explicitly detect as arrays, where as `hasInputArrays` means that an 
expression expects some number of array inputs. `Expr.analyzeInputs` 
implementations can currently only correctly categorize arguments that are 
_directly_ identifiers. 
   
   Now, the reason we need to be able to distinguish is an issue because I also 
accept _slop_ expressions. For example:
   
   ```
   array_to_string(concat('foo', tags), ', ')
   ``` 
   
   where `tags` is a multi-value dimension, the expression bindings will list 
`tags` as a scalar input, but `array_to_string` _does_ in fact take an array 
argument. By distinguishing these I can still accept this expression and 
translate it correctly to:
   
   ```
   array_to_string(map((tags) -> concat('foo', tags), tags), ',')
   ```
   and not _incorrectly_ use this optimized single input column introduced in 
this PR. This was not an issue prior to this PR because being multi-valued in 
the capabilities was reason enough to exclude an expression from the single 
column optimization, but after the restriction only applies to things that 
expects array inputs or produce array outputs. 
   
   Does that make sense? I will try to clear up the javadocs for this.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to