clintropolis opened a new pull request #11184:
URL: https://github.com/apache/druid/pull/11184


   ### Description
   This PR adds vectorization support for the Druid native expression logical 
operators, `!`, `&&`, `||`, as well as boolean functions `isnull`, `notnull`, 
and `nvl`.
   
   The `&&`, `||`, and `nvl` implementations are not quite as optimal as they 
could be, since they will evaluate all arguments up front, but I will fix this 
up in another branch I have as a follow-up (split from this one because it was 
starting to get big). The follow-up will add vectorized conditional 
expressions, and introduces a filtered vector binding that uses vector matches 
to allow processing only a subset of input rows. This filtered binding will 
allow slightly modifying these implementations so that `||` only needs to 
evaluate the rhs rows for which the lhs is false or null, `&&` where lhs is 
true or null, and `nvl` only when lhs is null.
   
   In the process of doing this PR and writing tests, I came to the conclusion 
that our logical operators behave very strangely all the time, and I think 
quite wrong in SQL compatible null handling mode since null was always treated 
as "false" instead of "unknown", so I've proposed changing the behavior in this 
PR.
   
   The first change is around output. Previously the logical operators would 
pass values through. For Druid numeric values, any value greater than 0 is 
`true`, so passing through values would result in some strange but technically 
correct outputs. The new behavior homogenizes output to a type appropriate 
boolean value, e.g. `1` or `0` for longs, `1.0` or `0.0` for doubles, and 
`true` or `false` for strings.
   
   Previous behavior:
   * `100 && 11` -> `11` 
   * `0.7 || 0.3` -> `0.7`
   
   New behavior:
   * `100 && 11` -> `1`
   * `0.7 || 0.3` -> `1.0`
   
   etc.
   
   The second change is that now the logical operators in SQL compatible mode 
will treat `null` as "unknown" when `druid.generic.useDefaultValueForNull` is 
set as `false` (SQL compatible null handling mode).
   
   For the "or" operator:
   * `true || null`, `null || true` -> `true`
   * `false || null`, `null || false`, `null || null`-> `null`
   
   For the "and" operator:
   * `true && null`, `null && true`, `null && null` -> `null`
   * `false && null`, `null && false` -> `false`
   
   
   Since this new behavior changes query results, subtly in the case of default 
mode since the results will be different (but equivalent in terms of true or 
false result), and fairly significantly in SQL compatible mode since it will 
now respect `null` values and treat them differently than always false, this PR 
also adds a new configuration, `druid.generic.useLegacyLogicalOperators`, which 
defaults to false to use the new behavior in this PR, but allows reverting to 
the existing behavior if desired. I don't really love this flag existing, but 
don't see a way around it unless we are ok with changing query results to use 
the new behavior only. `&&` and `||` will only be vectorized when 
`druid.generic.useLegacyLogicalOperators=false`, but if there is demand for it, 
implementations could probably be added to provide the legacy behavior as well.
   
   
   <hr>
   
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to