[jira] [Commented] (HIVE-28681) Default to disable hive.optimize.reducededuplication

yongzhi.shao (Jira) Sat, 28 Dec 2024 03:21:19 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-28681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908583#comment-17908583
 ]


yongzhi.shao commented on HIVE-28681:
-------------------------------------

[~zhangbutao] [~okumin] [~dkuzmenko] 

Hello. what do you think?

> Default to disable hive.optimize.reducededuplication
> ----------------------------------------------------
>
>                 Key: HIVE-28681
>                 URL: https://issues.apache.org/jira/browse/HIVE-28681
>             Project: Hive
>          Issue Type: Wish
>            Reporter: yongzhi.shao
>            Priority: Minor
>
> Currently, we have found that in many cases, due to HIVE defaulting to enable 
> hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well. 
> Especially when users employ the DISTINCT keyword, the situation tends to get 
> worse. Should we consider disabling this option by default to better handle 
> data skew scenarios for users?
> Additionally, we have noticed that this issue seems to have been discussed in 
> the community mailing list::[Re: Blog article 'Performance Tuning for 
> Single-table Queries'-Apache Mail 
> Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HIVE-28681) Default to disable hive.optimize.reducededuplication

Reply via email to