[
https://issues.apache.org/jira/browse/HIVE-28681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908583#comment-17908583
]
yongzhi.shao commented on HIVE-28681:
-------------------------------------
[~zhangbutao] [~okumin] [~dkuzmenko]
Hello. what do you think?
> Default to disable hive.optimize.reducededuplication
> ----------------------------------------------------
>
> Key: HIVE-28681
> URL: https://issues.apache.org/jira/browse/HIVE-28681
> Project: Hive
> Issue Type: Wish
> Reporter: yongzhi.shao
> Priority: Minor
>
> Currently, we have found that in many cases, due to HIVE defaulting to enable
> hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well.
> Especially when users employ the DISTINCT keyword, the situation tends to get
> worse. Should we consider disabling this option by default to better handle
> data skew scenarios for users?
> Additionally, we have noticed that this issue seems to have been discussed in
> the community mailing list::[Re: Blog article 'Performance Tuning for
> Single-table Queries'-Apache Mail
> Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)