clintropolis opened a new pull request #10441:
URL: https://github.com/apache/druid/pull/10441


   ### Description
   This PR adds support for the vectorized group by engine to group by numeric 
columns with null values, using a similar approach to the non-vectorized engine 
of storing an extra byte in the key to hold information about whether the value 
is null or not. I modified the signature of the methods on 
`VectorColumnProcessorFactory` to include `ColumnCapabilities` for the 
selector, so that `GroupByVectorColumnSelectorFactory` methods could check 
`capabilities.hasNulls().isFalse()` to continue to use the existing processors, 
and only use the newly added null handling processors when there are actually 
null values to deal with.
   
   In the process of adding tests for this, I also uncovered another issue with 
the `SpillingGrouper` when grouping on double values, where sometimes when 
making the round trip through serde they end up as floats, resulting in a cast 
exception when run through the comparator. To fix this issue I modified the 
comparators in `RowBasedGrouperHelper` to check for double typed keys and 
coerce both values to be doubles. A more proper fix would be to rework the 
SpillingGrouper completely so that the key serde just reads/writes directly 
instead of going through json generator, but that was too much for this PR, so 
I shall leave that for follow-up work.
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to