benkrug opened a new issue #11197:
URL: https://github.com/apache/druid/issues/11197


   Some druid SQL queries including GROUP BY and COUNT return incorrect 
results.  I'm including two examples.
   
   ### Affected Version
   
   Seen in 0.20.0 and 0.21.0.  The second example below apparently doesn't 
reproduce in 0.18.0.  I haven't tested the first on 0.18.0.
   
   ### Description
   
   Particular GROUP BY queries are returning 0 counts when they should return 
non-zero numbers.  Providing two examples, one with wikipedia, and one with a 
custom data set (attached).
   
   With wikipedia, we can see the following, to zero in on the issue:
   
   This query returns various non-zero numbers for the counts:
   
   ```
   select comment, count(*) FROM wikipedia
   WHERE channel = '#ar.wikipedia'
   GROUP BY comment
   ```
   
   This query will return 1's, which makes sense:
   
   ```
   select comment, count(distinct(channel)) FROM wikipedia
   WHERE channel = '#ar.wikipedia'
   GROUP BY comment
   ```
   
   So, we have data for the channel field (which makes sense, since we filter 
for a channel), but the following query returns 0's for all the counts, which 
is not only incorrect, but logically impossible given the query:
   
   ```
   select comment, count(channel) FROM wikipedia
   WHERE channel = '#ar.wikipedia'
   GROUP BY comment
   ```
   
   The second example adds a twist, I'm not sure of the reason.  With the given 
dataset (data file and spec attached), the following query also incorrectly 
returns all 0's:
   
   ```
   SELECT ctnsappname, count(ind_id) FROM testData 
   WHERE ind_id = 3006
   AND __time >= '2000-01-01'
   AND __time < '3000-01-01'
   GROUP BY ctnsappname
   ```
   
   Note that all the __time values are from 2021-01-31, so all data is included 
in the time filters.  However (the twist), if you remove the time filters, you 
seem to get correct results.  I didn't test with 0.18.0, but Jie Zang reported 
that for this query, results were correct in 0.18.0.  He also noted that in 
0.18.0, the explain uses a virtual column, but in 0.21.0, the virtual column 
entry (for v0) is empty.  I saw the same in 0.20.0.
   
   
   
   
[ingestdata.txt](https://github.com/apache/druid/files/6422782/ingestdata.txt)
   
[ingestspec.txt](https://github.com/apache/druid/files/6422783/ingestspec.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to