yongzhi.shao created HIVE-28681:
-----------------------------------
Summary: Default to disable hive.optimize.reducededuplication
Key: HIVE-28681
URL: https://issues.apache.org/jira/browse/HIVE-28681
Project: Hive
Issue Type: Wish
Reporter: yongzhi.shao
Currently, we have found that in many cases, due to HIVE defaulting to enable
hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well.
Especially when users employ the DISTINCT keyword, the situation tends to get
worse. Should we consider disabling this option by default to better handle
data skew scenarios for users?
Additionally, we have noticed that this issue seems to have been discussed in
the community mailing list::[Re: Blog article 'Performance Tuning for
Single-table Queries'-Apache Mail
Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)