clintropolis commented on a change in pull request #8047: optimize single input
column multi-value expressions
URL: https://github.com/apache/incubator-druid/pull/8047#discussion_r309502912
##########
File path: core/src/main/java/org/apache/druid/math/expr/Expr.java
##########
@@ -248,6 +256,24 @@ public BindingDetails(
return
arrayVariables.stream().map(IdentifierExpr::getIdentifier).collect(Collectors.toSet());
}
+ /**
+ * Returns true if the expression has any array inputs. Note that in some
cases, this can be true and
+ * {@link BindingDetails#getArrayVariables} can be empty, since the latter
can only shallowly collect identifiers
+ * that are explicitly used as arrays.
Review comment:
The distinction is that `getArrayVariables` returns the identifiers we were
able to explicitly detect as arrays, where as `hasInputArrays` means that an
expression expects some number of array inputs. `Expr.analyzeInputs`
implementations can currently only correctly categorize arguments that are
_directly_ identifiers.
Now, the reason we need to be able to distinguish is an issue because I also
accept _slop_ expressions. For example:
```
array_to_string(concat('foo', tags), ', ')
```
where `tags` is a multi-value dimension, the expression bindings will list
`tags` as a scalar input, but `array_to_string` _does_ in fact take an array
argument. By distinguishing these I can still accept this expression and
translate it correctly to:
```
array_to_string(map((tags) -> concat('foo', tags), tags), ',')
```
and not _incorrectly_ use this optimized single input column introduced in
this PR. This was not an issue prior to this PR because being multi-valued in
the capabilities was reason enough to exclude an expression from the single
column optimization, but after the restriction only applies to things that
expects array inputs or produce array outputs.
Does that make sense? I will try to clear up the javadocs for this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]