clintropolis opened a new pull request, #12576:
URL: https://github.com/apache/druid/pull/12576
### Description
This PR fixes 2 bugs, and made another one apparent, but I'm not quite sure
how to fix it yet so might have to save it for a follow-up.
The first bug is around overzealous cycle detection in virtual columns,
which would prevent some expressions which did not contain cycles, because it
did not discern between virtual column inputs from segment columns. Given an
expression like
```
case_searched(notnull(1 + x), v1, v2)
```
where `v1` and `v2` are virtual columns, if either virtual column referenced
`x`, prior to this PR it would be classified incorrectly as a cycle and throw
an exception.
The query that uncovered this issue was actually a query similar to
```
COALESCE(MV_FILTER_ONLY(dim3, ARRAY['b']), MV_FILTER_ONLY(dim3, ARRAY['c']))
```
and while investigating I noticed planned to the form
```
case_searched(notnull(filter((x) -> array_contains(array('b'), x),
\"dim3\")),\"v1\",\"v2\")
```
which given the virtual column specialization code I would have expected to
actually plan to
```
case_searched(notnull(\"v1\"),\"v1\",\"v2\")
```
which is the 2nd bug fixed, a missing call to `DruidExpression.visit` in the
`visitAll` code, meaning the specialization code isn't actually traversing the
whole tree.
While writing a test to fix this 2nd issue, I encountered a 3rd issue
involving the way grouping works whe the `ListFilteredVirtualColumn` is used as
an expression input, but I think could happen with any dimension selector that
returns `IndexedInts` as 0 length rows, and is using the "deferred" evaluation
expression selector (which saves expression evaluation until after the
grouping). However, the 0 length row is grouped as the "missing" value, which
means that there is no dictionary id to lookup - which is what evaluates the
expression in the deferred expression dimension selectors, and so this null
value is not actually used as an input to an expression and so remains null.
I'm still trying to determine the best way to fix this problem. As far as I can
tell, it should only affect the "deferred" evaluation expression dimension
selectors when used with a grouping engine and delegating to any selector that
can return an `IndexedInts` of size 0 for the row.
<hr>
This PR has:
- [x] been self-reviewed.
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]