Chiran Ravani created HIVE-24245: ------------------------------------ Summary: Vectorized PTF with count and distinct over partition producing incorrect results. Key: HIVE-24245 URL: https://issues.apache.org/jira/browse/HIVE-24245 Project: Hive Issue Type: Bug Components: Hive, PTF-Windowing, Vectorization Affects Versions: 3.1.2, 3.1.0 Reporter: Chiran Ravani
Vectorized PTF for count and distinct over partition is broken. It produces incorrect results. Below is the test case. {code} CREATE TABLE bigd781b_new ( id int, txt1 string, txt2 string, cda_date int, cda_job_name varchar(12)); INSERT INTO bigd781b_new VALUES (1,'2010005759','7164335675012038',20200528,'load1'), (2,'2010005759','7164335675012038',20200528,'load2'); {code} Running below query produces incorrect results {code} SELECT txt1, txt2, count(distinct txt1) over(partition by txt1) as n, count(distinct txt2) over(partition by txt2) as m FROM bigd781b_new WHERE cda_date = 20200528 and ( txt2 = '7164335675012038'); {code} as below. {code} +-------------+-------------------+----+----+ | txt1 | txt2 | n | m | +-------------+-------------------+----+----+ | 2010005759 | 7164335675012038 | 2 | 2 | | 2010005759 | 7164335675012038 | 2 | 2 | +-------------+-------------------+----+----+ {code} While the correct output would be {code} +-------------+-------------------+----+----+ | txt1 | txt2 | n | m | +-------------+-------------------+----+----+ | 2010005759 | 7164335675012038 | 1 | 1 | | 2010005759 | 7164335675012038 | 1 | 1 | +-------------+-------------------+----+----+ {code} The problem does not appear after setting below property set hive.vectorized.execution.ptf.enabled=false; -- This message was sent by Atlassian Jira (v8.3.4#803005)