jihoonson opened a new issue #9866:
URL: https://github.com/apache/druid/issues/9866


   ### Affected Version
   
   0.18.x
   
   ### Description
   
   As reported in https://github.com/apache/druid/issues/9792, a nested groupBy 
query can result in an incorrect result when these conditions are met:
   
   - The nested groupBy is on top of a Join of subqueries
   - Inner and outer groupBys have different filters. 
   
   In this case, the Join execution engine will use the filter of the outer 
groupBy query when it processes the inner groupBy query. For example, given a 
query as below,
   
   ```sql
   WITH abc AS (
     SELECT dim1, m2
     FROM druid.foo 
     WHERE "__time" >= '2001-01-02'
   ),
   def AS(
     SELECT t1.dim1, SUM(t2.m2) AS "metricSum" 
     FROM abc AS t1 INNER JOIN abc AS t2 ON t1.dim1 = t2.dim1
     WHERE t1.dim1='def'
     GROUP BY 1
   )
   SELECT count(*) FROM def
   ```
   
   Druid will make a query plan for this query as below:
   
   ```
    groupBy (outer)
       |
    groupBy (inner)
       |
      join
     /    \
   scan  scan
    |      |
   foo    foo
   ```
   
   For this query plan, the broker will execute the two scan queries at leaf, 
materialize the results in memory, and then execute the join and groupBys. The 
join plan will be converted into a joinSegment and executed with the inner 
groupBy together. Due to this bug, the broker will ignore the filter `t1.dim1 = 
'def'` on the inner groupBy query since there is no filter on the outer groupBy.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to