[jira] [Commented] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data

Pengcheng Xiong (JIRA) Mon, 23 Mar 2015 14:10:03 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-10062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14376639#comment-14376639
 ]


Pengcheng Xiong commented on HIVE-10062:
----------------------------------------

The explain results told us:
{code}
                          Map 6 (TS s2)
                                             \
Map1 (TS s1)-> Reduce 2 -> Union3 -> Reduce 4 (dest 1)
                               \
                                -> Reduce 5 (dest 2)
{code} 

As you can see, the Reduce 5 comes before Union 3 and misses all the results 
from Map 6.

cc'ing [~jpullokkaran] and [~hagleitn]

> HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data
> -------------------------------------------------------------------------
>
>                 Key: HIVE-10062
>                 URL: https://issues.apache.org/jira/browse/HIVE-10062
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Pengcheng Xiong
>            Priority: Critical
>
> In q.test environment with src table, execute the following query: 
> {code}
> CREATE TABLE DEST1(key STRING, value STRING) STORED AS TEXTFILE;
> CREATE TABLE DEST2(key STRING, val1 STRING, val2 STRING) STORED AS TEXTFILE;
> FROM (select 'tst1' as key, cast(count(1) as string) as value from src s1
>                          UNION all 
>       select s2.key as key, s2.value as value from src s2) unionsrc
> INSERT OVERWRITE TABLE DEST1 SELECT unionsrc.key, COUNT(DISTINCT 
> SUBSTR(unionsrc.value,5)) GROUP BY unionsrc.key
> INSERT OVERWRITE TABLE DEST2 SELECT unionsrc.key, unionsrc.value, 
> COUNT(DISTINCT SUBSTR(unionsrc.value,5)) 
> GROUP BY unionsrc.key, unionsrc.value;
> select * from DEST1;
> select * from DEST2;
> {code}
> DEST1 and DEST2 should both have 310 rows. However, DEST2 only has 1 row 
> "tst1    500     1"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-10062) HiveOnTez: Union followed by Multi-GB followed by Multi-insert loses data

Reply via email to