clintropolis opened a new pull request #10614:
URL: https://github.com/apache/druid/pull/10614


   ### Description
   This PR improves `ExpressionVectorValueSelector` and 
`ExpressionVectorObjectSelector` to re-use evaluation results for the same 
selector. While I haven't measured this at all, it makes a relatively big 
difference when `druid.generic.useDefaultValueForNull=false` just because the 
previous behavior meant that the expression was evaluated twice for numeric 
primitive expressions: once to get the value vector and again to get the null 
vector. I was aware of this issue but didn't get to it in the first round of 
additions I did for the vectorized expression stuffs and sort of forgot about 
it until now 😅 .
   
   It works by expanding `Expr.VectorInputBinding` to further mimic 
`ReadableVectorOffset`:
   ```java
       /**
        * Returns an integer that uniquely identifies the current position of 
the underlying vector offset, if this
        * binding is backed by a segment. This is useful for caching: it is 
safe to assume nothing has changed in the
        * offset so long as the id remains the same. See also: 
ReadableVectorOffset (in druid-processing)
        */
       int getCurrentVectorId();
   ```
   
   which allows slightly modifying the backing of the 
`ExpressionVectorInputBinding` implementation to be based on 
`ReadableVectorOffset` which has a `getId` method, instead of 
`VectorSizeInspector` which only offers current and max vector size. The size 
inspector provided by `VectorColumnSelectorFactory.getVectorSizeInspector` was 
already a `ReadableVectorOffset`, so I have reworked the method into 
`getReadableVectorOffset`. Alternatively, we could consider pushing `getId` 
down to `VectorSizeInspector`, but that seemed to be a bit more involved of a 
change.
   
   With an identifier in place, the selectors can cache evaluation results and 
re-use them as long as the underlying offset does not change.
   
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   <hr>
   
   ##### Key changed/added classes in this PR
    * `ExpressionVectorValueSelector`
    * `ExpressionVectorObjectSelector`
    * `ExpressionVectorInputBinding`
    * `VectorColumnSelectorFactory`
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to