clintropolis opened a new pull request, #12576:
URL: https://github.com/apache/druid/pull/12576

   ### Description
   This PR fixes 2 bugs, and made another one apparent, but I'm not quite sure 
how to fix it yet so might have to save it for a follow-up.
   
   The first bug is around overzealous cycle detection in virtual columns, 
which would prevent some expressions which did not contain cycles, because it 
did not discern between virtual column inputs from segment columns. Given an 
expression like
   
   ```
   case_searched(notnull(1 + x), v1, v2)
   ```
   
   where `v1` and `v2` are virtual columns, if either virtual column referenced 
`x`, prior to this PR it would be classified incorrectly as a cycle and throw 
an exception.
   
   The query that uncovered this issue was actually a query similar to 
   ```
   COALESCE(MV_FILTER_ONLY(dim3, ARRAY['b']), MV_FILTER_ONLY(dim3, ARRAY['c']))
   ```
   
   and while investigating I noticed planned to the form
   
   ```
   case_searched(notnull(filter((x) -> array_contains(array('b'), x), 
\"dim3\")),\"v1\",\"v2\")
   ```
   which given the virtual column specialization code I would have expected to 
actually plan to 
   ```
   case_searched(notnull(\"v1\"),\"v1\",\"v2\")
   ```
   which is the 2nd bug fixed, a missing call to `DruidExpression.visit` in the 
`visitAll` code, meaning the specialization code isn't actually traversing the 
whole tree.
   
    While writing a test to fix this 2nd issue, I encountered a 3rd issue 
involving the way grouping works whe the `ListFilteredVirtualColumn` is used as 
an expression input, but I think could happen with any dimension selector that 
returns `IndexedInts` as 0 length rows, and is using the "deferred" evaluation 
expression selector (which saves expression evaluation until after the 
grouping). However, the 0 length row is grouped as the "missing" value, which 
means that there is no dictionary id to lookup - which is what evaluates the 
expression in the deferred expression dimension selectors, and so this null 
value is not actually used as an input to an expression and so remains null. 
I'm still trying to determine the best way to fix this problem. As far as I can 
tell, it should only affect the "deferred" evaluation expression dimension 
selectors when used with a grouping engine and delegating to any selector that 
can return an `IndexedInts` of size 0 for the row.
   
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to