yongzhi.shao created HIVE-28681:
-----------------------------------

             Summary: Default to disable hive.optimize.reducededuplication
                 Key: HIVE-28681
                 URL: https://issues.apache.org/jira/browse/HIVE-28681
             Project: Hive
          Issue Type: Wish
            Reporter: yongzhi.shao


Currently, we have found that in many cases, due to HIVE defaulting to enable 
hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well. 
Especially when users employ the DISTINCT keyword, the situation tends to get 
worse. Should we consider disabling this option by default to better handle 
data skew scenarios for users?

Additionally, we have noticed that this issue seems to have been discussed in 
the community mailing list::[Re: Blog article 'Performance Tuning for 
Single-table Queries'-Apache Mail 
Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to