[ 
https://issues.apache.org/jira/browse/HIVE-25836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yao Guangdong updated HIVE-25836:
---------------------------------
    Summary: Tez union all operation may cause duplicate data  (was: Tez union 
operation may cause duplicate data)

> Tez union all operation may cause duplicate data
> ------------------------------------------------
>
>                 Key: HIVE-25836
>                 URL: https://issues.apache.org/jira/browse/HIVE-25836
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.3.0, 2.3.8
>            Reporter: Yao Guangdong
>            Priority: Critical
>
> When we use tez union all operation.Which will cause some duplicate data in 
> some cases. Which is because tez union all operation can generate sub 
> directory in the table or parition directory.The sub directory use number as 
> name and the result data file will stored in sub directory.If the sub 
> directory have the speculate task execute and the speculate task's result 
> file also in the sub directory.The hive client will delete duplicate task's 
> file when the job finished.The hive client only check one level have the 
> duplicate task's file.BecauseĀ  the sub directory's exsist. Which make the sub 
> directory's duplicate task's file not delete and the duplicate data happened.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to