[ 
https://issues.apache.org/jira/browse/HIVE-16241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satoshi Iijima updated HIVE-16241:
----------------------------------
    Description: 
Example is the bellow. Each subquery 'key' column is unique. But when they are 
joined on 'key' column, a result is duplicated.
{code}
CREATE TABLE test (
  key   STRING,
  type  STRING,
  value INT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/tmp/test.gz' OVERWRITE INTO TABLE test;

SELECT * FROM test;

A       type1   30000
B       type2   20000
C       type2   5000

SELECT l.*
FROM (
  SELECT * FROM test LATERAL VIEW explode(ARRAY(key)) e AS dammy
) l JOIN (
    SELECT key, rank() OVER (PARTITION BY type ORDER BY value DESC) rank 
    FROM test
) r ON l.key = r.key

A       type1   30000   A
A       type1   30000   A
B       type2   20000   B
B       type2   20000   B
C       type2   5000    C
C       type2   5000    C
{code}

  was:
Example:
{code}
CREATE TABLE test (
  key   STRING,
  type  STRING,
  value INT
) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

LOAD DATA LOCAL INPATH '/tmp/test.gz' OVERWRITE INTO TABLE test;

SELECT * FROM test;

A       type1   30000
B       type2   20000
C       type2   5000

SELECT l.*
FROM (
  SELECT * FROM test LATERAL VIEW explode(ARRAY(key)) e AS dammy
) l JOIN (
    SELECT key, rank() OVER (PARTITION BY type ORDER BY value DESC) rank 
    FROM test
) r ON l.key = r.key

A       type1   30000   A
A       type1   30000   A
B       type2   20000   B
B       type2   20000   B
C       type2   5000    C
C       type2   5000    C
{code}


> When PTF, explode AND join are used together, result is duplicated 
> -------------------------------------------------------------------
>
>                 Key: HIVE-16241
>                 URL: https://issues.apache.org/jira/browse/HIVE-16241
>             Project: Hive
>          Issue Type: Bug
>          Components: PTF-Windowing, Query Processor
>    Affects Versions: 2.1.0
>            Reporter: Satoshi Iijima
>
> Example is the bellow. Each subquery 'key' column is unique. But when they 
> are joined on 'key' column, a result is duplicated.
> {code}
> CREATE TABLE test (
>   key   STRING,
>   type  STRING,
>   value INT
> ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
> LOAD DATA LOCAL INPATH '/tmp/test.gz' OVERWRITE INTO TABLE test;
> SELECT * FROM test;
> A     type1   30000
> B     type2   20000
> C     type2   5000
> SELECT l.*
> FROM (
>   SELECT * FROM test LATERAL VIEW explode(ARRAY(key)) e AS dammy
> ) l JOIN (
>     SELECT key, rank() OVER (PARTITION BY type ORDER BY value DESC) rank 
>     FROM test
> ) r ON l.key = r.key
> A     type1   30000   A
> A     type1   30000   A
> B     type2   20000   B
> B     type2   20000   B
> C     type2   5000    C
> C     type2   5000    C
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to