clintropolis opened a new pull request #11184:
URL: https://github.com/apache/druid/pull/11184
### Description
This PR adds vectorization support for the Druid native expression logical
operators, `!`, `&&`, `||`, as well as boolean functions `isnull`, `notnull`,
and `nvl`.
The `&&`, `||`, and `nvl` implementations are not quite as optimal as they
could be, since they will evaluate all arguments up front, but I will fix this
up in another branch I have as a follow-up (split from this one because it was
starting to get big). The follow-up will add vectorized conditional
expressions, and introduces a filtered vector binding that uses vector matches
to allow processing only a subset of input rows. This filtered binding will
allow slightly modifying these implementations so that `||` only needs to
evaluate the rhs rows for which the lhs is false or null, `&&` where lhs is
true or null, and `nvl` only when lhs is null.
In the process of doing this PR and writing tests, I came to the conclusion
that our logical operators behave very strangely all the time, and I think
quite wrong in SQL compatible null handling mode since null was always treated
as "false" instead of "unknown", so I've proposed changing the behavior in this
PR.
The first change is around output. Previously the logical operators would
pass values through. For Druid numeric values, any value greater than 0 is
`true`, so passing through values would result in some strange but technically
correct outputs. The new behavior homogenizes output to a type appropriate
boolean value, e.g. `1` or `0` for longs, `1.0` or `0.0` for doubles, and
`true` or `false` for strings.
Previous behavior:
* `100 && 11` -> `11`
* `0.7 || 0.3` -> `0.7`
New behavior:
* `100 && 11` -> `1`
* `0.7 || 0.3` -> `1.0`
etc.
The second change is that now the logical operators in SQL compatible mode
will treat `null` as "unknown" when `druid.generic.useDefaultValueForNull` is
set as `false` (SQL compatible null handling mode).
For the "or" operator:
* `true || null`, `null || true` -> `true`
* `false || null`, `null || false`, `null || null`-> `null`
For the "and" operator:
* `true && null`, `null && true`, `null && null` -> `null`
* `false && null`, `null && false` -> `false`
Since this new behavior changes query results, subtly in the case of default
mode since the results will be different (but equivalent in terms of true or
false result), and fairly significantly in SQL compatible mode since it will
now respect `null` values and treat them differently than always false, this PR
also adds a new configuration, `druid.generic.useLegacyLogicalOperators`, which
defaults to false to use the new behavior in this PR, but allows reverting to
the existing behavior if desired. I don't really love this flag existing, but
don't see a way around it unless we are ok with changing query results to use
the new behavior only. `&&` and `||` will only be vectorized when
`druid.generic.useLegacyLogicalOperators=false`, but if there is demand for it,
implementations could probably be added to provide the legacy behavior as well.
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]