[ 
https://issues.apache.org/jira/browse/HIVE-27494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17744986#comment-17744986
 ] 

Stamatis Zampetakis commented on HIVE-27494:
--------------------------------------------

The description talks directly about some internal directory structure but it's 
difficult to follow for people (like myself) that are not familiar with the 
area. [~dengzh] Can you add a few more high level infomration in the 
description (such as DDL, query, plan, etc.,) that leads into this kind of 
problematic situation? I had also a look in the PR but it has low level details 
about code and it's hard to follow.

Before diving into a review I would like first to understand what the problem 
is.

> Deduplicate the task result that generated by more branches in union all
> ------------------------------------------------------------------------
>
>                 Key: HIVE-27494
>                 URL: https://issues.apache.org/jira/browse/HIVE-27494
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zhihua Deng
>            Assignee: Zhihua Deng
>            Priority: Major
>              Labels: pull-request-available
>
> HIVE-23891 adds the ability to deduplicate the task result that under the 
> directory,
> <table-dir>/<staging-dir>/_tmp.-ext-10000/<dynamic-partition-dir>/HIVE_UNION_SUBDIR_1,
> but turns out to ignore taking the same action to the directory for the same 
> query:
> <table-dir>/<staging-dir>/_tmp.-ext-10000/<dynamic-partition-dir>/HIVE_UNION_SUBDIR_2.
> So user may still have the same data duplication problem in multiple tez task 
> attempts.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to