[
https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376639#comment-14376639
]
Pengcheng Xiong commented on HIVE-10062:
----------------------------------------
The explain results told us:
{code}
Map 6 (TS s2)
\
Map1 (TS s1)-> Reduce 2 -> Union3 -> Reduce 4 (dest 1)
\
-> Reduce 5 (dest 2)
{code}
As you can see, the Reduce 5 comes before Union 3 and misses all the results
from Map 6.
cc'ing [~jpullokkaran] and [~hagleitn]
> HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
> -------------------------------------------------------------------------
>
> Key: HIVE-10062
> URL: https://issues.apache.org/jira/browse/HIVE-10062
> Project: Hive
> Issue Type: Bug
> Reporter: Pengcheng Xiong
> Priority: Critical
>
> In q.test environment with src table, execute the following query:
> {code}
> CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;
> CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;
> FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
> UNION all
> select s2.key as key, s2.value as value from src s2) unionsrc
> INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT
> SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
> INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value,
> COUNT(DISTINCT SUBSTR(unionsrc.value,5))
> GROUP BY unionsrc.key, unionsrc.value;
> select * from DEST1;
> select * from DEST2;
> {code}
> DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row
> "tst1 500 1"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)