Igor created HIVE-21930: --------------------------- Summary: WINDOW COUNT DISTINCT return wrong value with PARTITION BY Key: HIVE-21930 URL: https://issues.apache.org/jira/browse/HIVE-21930 Project: Hive Issue Type: Bug Affects Versions: 3.1.0 Environment: Beeline version 3.1.0.3.0.1.0-187 by Apache Hive Reporter: Igor
count(distinct a) over (partiton by b) return wring result. For example: {code:java} select p, day, ts , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days FROM T{code} count(1) and count(distinct day) return the same result. Count distinct is wrong. I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and count(distinct return correct result. Following query return non-empty result: {code:java} select A.*, B.days, B. from ( select p, day, ts , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number , count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days , size(collect_set(day) OVER (PARTITION BY phone)) as days2 , dense_rank() over (partition by phone order by day) + dense_rank() over (partition by phone order by day desc) - 1 as days3 FROM T ) as A join ( select p, day, ts , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as lines , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as days FROM T ) as B on A.p=B.p and A.line_number=B.line_number where A.days!=B.days order by A.p, A.line_number {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)