clintropolis opened a new pull request #10441:
URL: https://github.com/apache/druid/pull/10441
### Description
This PR adds support for the vectorized group by engine to group by numeric
columns with null values, using a similar approach to the non-vectorized engine
of storing an extra byte in the key to hold information about whether the value
is null or not. I modified the signature of the methods on
`VectorColumnProcessorFactory` to include `ColumnCapabilities` for the
selector, so that `GroupByVectorColumnSelectorFactory` methods could check
`capabilities.hasNulls().isFalse()` to continue to use the existing processors,
and only use the newly added null handling processors when there are actually
null values to deal with.
In the process of adding tests for this, I also uncovered another issue with
the `SpillingGrouper` when grouping on double values, where sometimes when
making the round trip through serde they end up as floats, resulting in a cast
exception when run through the comparator. To fix this issue I modified the
comparators in `RowBasedGrouperHelper` to check for double typed keys and
coerce both values to be doubles. A more proper fix would be to rework the
SpillingGrouper completely so that the key serde just reads/writes directly
instead of going through json generator, but that was too much for this PR, so
I shall leave that for follow-up work.
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]