clintropolis opened a new pull request, #12627:
URL: https://github.com/apache/druid/pull/12627
### Description
This PR splits out `NULL` value index into their own construct for filter
processing for two reasons. The first is that the new index structure,
introduced in #12388 means that we can now provide basically any index we can
imagine, and I imagined by having a separate `NullValueIndex` for
`ColumnIndexSupplier` to provide to `Filter`, we can provide limited indexes
for existing Druid numeric columns (if
`druid.generic.useDefaultValueForFalse=false`). This improves performance quite
nicely when using numeric columns in `IS NULL`/`IS NOT NULL` style queries.
```
// 42,43: filter numeric nulls
"SELECT SUM(long5) FROM foo WHERE long5 IS NOT NULL",
"SELECT string2, SUM(long5) FROM foo WHERE long5 IS NOT NULL GROUP BY
1"
```
before:
```
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlExpressionBenchmark.querySql 42 5000000 false
avgt 5 107.308 ± 3.975 ms/op
SqlExpressionBenchmark.querySql 42 5000000 force
avgt 5 118.790 ± 1.792 ms/op
SqlExpressionBenchmark.querySql 43 5000000 false
avgt 5 259.829 ± 12.465 ms/op
SqlExpressionBenchmark.querySql 43 5000000 force
avgt 5 241.353 ± 10.005 ms/op
```
after:
```
Benchmark (query) (rowsPerSegment) (vectorize)
Mode Cnt Score Error Units
SqlExpressionBenchmark.querySql 42 5000000 false
avgt 5 71.755 ± 3.011 ms/op
SqlExpressionBenchmark.querySql 42 5000000 force
avgt 5 55.674 ± 1.288 ms/op
SqlExpressionBenchmark.querySql 43 5000000 false
avgt 5 241.125 ± 5.544 ms/op
SqlExpressionBenchmark.querySql 43 5000000 force
avgt 5 176.183 ± 7.409 ms/op
```
The second reason to split null value indexes into their own thing, is to
set the stage towards allowing our filter behavior to be SQL compliant. With
this explicit null value index in place, if we modify the other index providers
to never match null, we should make it a lot harder for filters to accidentally
match nulls using indexes, and could allow for the cursor builder to use of
implicit 'is not null' indexes for columns which are not explicitly being
matched for 'is null'.
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]