[jira] [Comment Edited] (HIVE-28681) Default to disable hive.optimize.reducededuplication

yongzhi.shao (Jira) Sun, 29 Dec 2024 22:16:03 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-28681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908583#comment-17908583
 ]


yongzhi.shao edited comment on HIVE-28681 at 12/30/24 6:14 AM:
---------------------------------------------------------------

[~zhangbutao] [~okumin] [~dkuzmenko] [~seonggon] [~glapark] [~ayushtkn] 

Hello. what do you think?


was (Author: lisoda):
[~zhangbutao] [~okumin] [~dkuzmenko] [~seonggon] [~glapark]  [~ayushsaxena] 

Hello. what do you think?

> Default to disable hive.optimize.reducededuplication
> ----------------------------------------------------
>
>                 Key: HIVE-28681
>                 URL: https://issues.apache.org/jira/browse/HIVE-28681
>             Project: Hive
>          Issue Type: Wish
>            Reporter: yongzhi.shao
>            Priority: Minor
>
> Currently, we have found that in many cases, due to HIVE defaulting to enable 
> hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well. 
> Especially when users employ the DISTINCT keyword, the situation tends to get 
> worse. Should we consider disabling this option by default to better handle 
> data skew scenarios for users?Or do we have a better way of tuning this 
> configuration item?
> Additionally, we have noticed that this issue seems to have been discussed in 
> the community mailing list::[Re: Blog article 'Performance Tuning for 
> Single-table Queries'-Apache Mail 
> Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (HIVE-28681) Default to disable hive.optimize.reducededuplication

Reply via email to