[
https://issues.apache.org/jira/browse/HIVE-28681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17908583#comment-17908583
]
yongzhi.shao edited comment on HIVE-28681 at 12/30/24 6:14 AM:
---------------------------------------------------------------
[~zhangbutao] [~okumin] [~dkuzmenko] [~seonggon] [~glapark] [~ayushtkn]
Hello. what do you think?
was (Author: lisoda):
[~zhangbutao] [~okumin] [~dkuzmenko] [~seonggon] [~glapark] [~ayushsaxena]
Hello. what do you think?
> Default to disable hive.optimize.reducededuplication
> ----------------------------------------------------
>
> Key: HIVE-28681
> URL: https://issues.apache.org/jira/browse/HIVE-28681
> Project: Hive
> Issue Type: Wish
> Reporter: yongzhi.shao
> Priority: Minor
>
> Currently, we have found that in many cases, due to HIVE defaulting to enable
> hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well.
> Especially when users employ the DISTINCT keyword, the situation tends to get
> worse. Should we consider disabling this option by default to better handle
> data skew scenarios for users?Or do we have a better way of tuning this
> configuration item?
> Additionally, we have noticed that this issue seems to have been discussed in
> the community mailing list::[Re: Blog article 'Performance Tuning for
> Single-table Queries'-Apache Mail
> Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)