[
https://issues.apache.org/jira/browse/HIVE-21930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor updated HIVE-21930:
------------------------
Description:
count(distinct a) over (partiton by b) return wring result. For example:
{code:java}
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as days
FROM T{code}
WINDOW specification doesn't affect on results: same wrong with and without
window.
count(1) and count(distinct day) return the same result. Count distinct is
wrong.
I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and
count(distinct return correct result.
Following query return non-empty result:
{code:java}
select A.*, B.days, B. from (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as days
, size(collect_set(day) OVER (PARTITION BY phone)) as days2
, dense_rank() over (partition by phone order by day) + dense_rank() over
(partition by phone order by day desc) - 1 as days3
FROM T ) as A
join (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as days
FROM T
) as B on A.p=B.p and A.line_number=B.line_number
where A.days!=B.days
order by A.p, A.line_number
{code}
was:
count(distinct a) over (partiton by b) return wring result. For example:
{code:java}
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as days
FROM T{code}
count(1) and count(distinct day) return the same result. Count distinct is
wrong.
I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and
count(distinct return correct result.
Following query return non-empty result:
{code:java}
select A.*, B.days, B. from (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED
FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as days
, size(collect_set(day) OVER (PARTITION BY phone)) as days2
, dense_rank() over (partition by phone order by day) + dense_rank() over
(partition by phone order by day desc) - 1 as days3
FROM T ) as A
join (
select p, day, ts
, row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
, count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING) as lines
, count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING
AND UNBOUNDED FOLLOWING) as days
FROM T
) as B on A.p=B.p and A.line_number=B.line_number
where A.days!=B.days
order by A.p, A.line_number
{code}
> WINDOW COUNT DISTINCT return wrong value with PARTITION BY
> ----------------------------------------------------------
>
> Key: HIVE-21930
> URL: https://issues.apache.org/jira/browse/HIVE-21930
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.0
> Environment: Beeline version 3.1.0.3.0.1.0-187 by Apache Hive
> Reporter: Igor
> Priority: Major
> Labels: distinct, window_funcion
>
> count(distinct a) over (partiton by b) return wring result. For example:
> {code:java}
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED
> PRECEDING AND UNBOUNDED FOLLOWING) as days
> FROM T{code}
> WINDOW specification doesn't affect on results: same wrong with and without
> window.
> count(1) and count(distinct day) return the same result. Count distinct is
> wrong.
>
> I've add size(collect_set(day) OVER (PARTITION BY phone)) as days2 and
> count(distinct return correct result.
> Following query return non-empty result:
> {code:java}
> select A.*, B.days, B. from (
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY p ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED
> PRECEDING AND UNBOUNDED FOLLOWING) as days
> , size(collect_set(day) OVER (PARTITION BY phone)) as days2
> , dense_rank() over (partition by phone order by day) + dense_rank() over
> (partition by phone order by day desc) - 1 as days3
> FROM T ) as A
> join (
> select p, day, ts
> , row_number() OVER (PARTITION BY phone ORDER BY ts ASC) as line_number
> , count(1) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED PRECEDING AND
> UNBOUNDED FOLLOWING) as lines
> , count(distinct day) OVER (PARTITION BY phone ROWS BETWEEN UNBOUNDED
> PRECEDING AND UNBOUNDED FOLLOWING) as days
> FROM T
> ) as B on A.p=B.p and A.line_number=B.line_number
> where A.days!=B.days
> order by A.p, A.line_number
> {code}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)