Igor created HIVE-21930:
---------------------------

             Summary: WINDOW COUNT DISTINCT return wrong value with PARTITION BY
                 Key: HIVE-21930
                 URL: https://issues.apache.org/jira/browse/HIVE-21930
             Project: Hive
          Issue Type: Bug
    Affects Versions: 3.1.0
         Environment: Beeline version 3.1.0.3.0.1.0-187 by Apache Hive
            Reporter: Igor


count(distinct a) over (partiton by b) return wring result. For example:
{code:java}
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
FROM T{code}
 

count(1) and count(distinct day) return the same result. Count distinct is 
wrong.

 

I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and 
count(distinct return correct result.

Following query return non-empty result:
{code:java}
select A.*, B.days, B. from (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED 
FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
, size(collect_set(day) OVER (PARTITION BY phone)) as days2
, dense_rank() over (partition by phone order by day) + dense_rank() over 
(partition by phone order by day desc) - 1 as days3
FROM T ) as A 
join (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING 
AND UNBOUNDED FOLLOWING) as days
FROM T
) as B on A.p=B.p and A.line_number=B.line_number
where A.days!=B.days
order by A.p, A.line_number
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to