clintropolis opened a new pull request #10614:
URL: https://github.com/apache/druid/pull/10614
### Description
This PR improves `ExpressionVectorValueSelector` and
`ExpressionVectorObjectSelector` to re-use evaluation results for the same
selector. While I haven't measured this at all, it makes a relatively big
difference when `druid.generic.useDefaultValueForNull=false` just because the
previous behavior meant that the expression was evaluated twice for numeric
primitive expressions: once to get the value vector and again to get the null
vector. I was aware of this issue but didn't get to it in the first round of
additions I did for the vectorized expression stuffs and sort of forgot about
it until now 😅 .
It works by expanding `Expr.VectorInputBinding` to further mimic
`ReadableVectorOffset`:
```java
/**
* Returns an integer that uniquely identifies the current position of
the underlying vector offset, if this
* binding is backed by a segment. This is useful for caching: it is
safe to assume nothing has changed in the
* offset so long as the id remains the same. See also:
ReadableVectorOffset (in druid-processing)
*/
int getCurrentVectorId();
```
which allows slightly modifying the backing of the
`ExpressionVectorInputBinding` implementation to be based on
`ReadableVectorOffset` which has a `getId` method, instead of
`VectorSizeInspector` which only offers current and max vector size. The size
inspector provided by `VectorColumnSelectorFactory.getVectorSizeInspector` was
already a `ReadableVectorOffset`, so I have reworked the method into
`getReadableVectorOffset`. Alternatively, we could consider pushing `getId`
down to `VectorSizeInspector`, but that seemed to be a bit more involved of a
change.
With an identifier in place, the selectors can cache evaluation results and
re-use them as long as the underlying offset does not change.
<hr>
This PR has:
- [ ] been self-reviewed.
- [ ] added documentation for new or modified features or behaviors.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [ ] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [ ] been tested in a test Druid cluster.
<hr>
##### Key changed/added classes in this PR
* `ExpressionVectorValueSelector`
* `ExpressionVectorObjectSelector`
* `ExpressionVectorInputBinding`
* `VectorColumnSelectorFactory`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]