[GitHub] [druid] a2l007 opened a new issue, #13204: Incorrect result format for queries with single value filters and GROUPING SET on the same dimension

GitBox Mon, 10 Oct 2022 15:58:17 -0700


a2l007 opened a new issue, #13204:
URL: https://github.com/apache/druid/issues/13204


   ### Affected Version
   
   Tested on 0.22, 0.23 and 24.0
   
   ### Description
   Consider the following SQL query:
    ```
   select sum(added) AS added_sum, countryName from wikipedia 
   where cityName = 'Cilacap'
   and countryName IN ('Indonesia')
   GROUP BY GROUPING SETS ( (FLOOR(__time TO MONTH), countryName), 
(FLOOR(__time TO MONTH)) )
   
   ```
   
   The grouping set dimension `countryName` is also included in the WHERE 
clause filtering on a single value. The result for this query ends up being:
   
   ```
   added_sum | countryName
   8         | Indonesia
   8         | Indonesia
   ```
   
   instead of:
   ```
   added_sum | countryName
   8         | Indonesia
   8         | null
   ```
   
   Since the second subtotal group doesn't include `countryName` , the second 
result row should have this value as null. The aggregate results are correct 
either ways, but formatting the query results can be confusing since 
`COALESCE(countryName, 'TOTAL')` as countryName does not work here.
   
   ### Analysis
   
   The native query plan for this query is:
   ```
   [
       {
           "query": {
               "queryType": "groupBy",
               "dataSource": {
                   "type": "table",
                   "name": "wikipedia"
               },
               "intervals": {
                   "type": "intervals",
                   "intervals": [
                       
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
                   ]
               },
               "virtualColumns": [
                   {
                       "type": "expression",
                       "name": "v0",
                       "expression": 
"timestamp_floor(\"__time\",'P1M',null,'UTC')",
                       "outputType": "LONG"
                   }
               ],
               "filter": {
                   "type": "and",
                   "fields": [
                       {
                           "type": "selector",
                           "dimension": "cityName",
                           "value": "Cilacap"
                       },
                       {
                           "type": "selector",
                           "dimension": "countryName",
                           "value": "Indonesia"
                       }
                   ]
               },
               "granularity": {
                   "type": "all"
               },
               "dimensions": [
                   {
                       "type": "default",
                       "dimension": "v0",
                       "outputName": "d0",
                       "outputType": "LONG"
                   }
               ],
               "aggregations": [
                   {
                       "type": "longSum",
                       "name": "a0",
                       "fieldName": "added"
                   }
               ],
               "postAggregations": [
                   {
                       "type": "expression",
                       "name": "p0",
                       "expression": "'Indonesia'"
                   }
               ],
               "limitSpec": {
                   "type": "default",
                   "columns": [],
                   "limit": 1001
               },
               "subtotalsSpec": [
                   [
                       "d0"
                   ],
                   [
                       "d0"
                   ]
               ],
               "context": {
                   "sqlOuterLimit": 1001,
                   "sqlQueryId": "52fce1a4-7d39-4c02-aad1-29cec2c2e3c3",
                   "timestampResultField": "d0",
                   "timestampResultFieldGranularity": "MONTH",
                   "timestampResultFieldInOriginalDimensions": 0
               }
           },
           "signature": [
               {
                   "name": "a0",
                   "type": "LONG"
               },
               {
                   "name": "p0",
                   "type": "STRING"
               }
           ]
       }
   ]
   
   ```
   It can be seen that during the planning phase, the `countryName` dimension 
was converted into a post aggregator and removed from the dimensions list. As a 
result, the subtotal grouping ends up grouping on `__time` in both cases.
   This issue does not arise if there are more than one values in the filter 
list for `countryName` since this dimension can no longer be converted to a 
post agg and thus remains in the dimension list.
   
   One possible fix that I'm currently evaluating is to skip dropping the 
dimension 
[here](https://github.com/apache/druid/blob/master/sql/src/main/java/org/apache/druid/sql/calcite/rel/Grouping.java#L240)
 if there is a subtotal spec existing for that dimension and modify the row 
signature accordingly. Unless there are any better ways to fix this issue, I 
can send out the PR after adequate testing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] a2l007 opened a new issue, #13204: Incorrect result format for queries with single value filters and GROUPING SET on the same dimension

Reply via email to