[jira] [Updated] (HIVE-28681) Default to disable hive.optimize.reducededuplication

yongzhi.shao (Jira) Sat, 28 Dec 2024 03:28:25 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-28681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


yongzhi.shao updated HIVE-28681:
--------------------------------
    Description: 
Currently, we have found that in many cases, due to HIVE defaulting to enable 
hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well. 
Especially when users employ the DISTINCT keyword, the situation tends to get 
worse. Should we consider disabling this option by default to better handle 
data skew scenarios for users?Or do we have a better way of tuning this 
configuration item?

Additionally, we have noticed that this issue seems to have been discussed in 
the community mailing list::[Re: Blog article 'Performance Tuning for 
Single-table Queries'-Apache Mail 
Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]

  was:
Currently, we have found that in many cases, due to HIVE defaulting to enable 
hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well. 
Especially when users employ the DISTINCT keyword, the situation tends to get 
worse. Should we consider disabling this option by default to better handle 
data skew scenarios for users?

Additionally, we have noticed that this issue seems to have been discussed in 
the community mailing list::[Re: Blog article 'Performance Tuning for 
Single-table Queries'-Apache Mail 
Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]


> Default to disable hive.optimize.reducededuplication
> ----------------------------------------------------
>
>                 Key: HIVE-28681
>                 URL: https://issues.apache.org/jira/browse/HIVE-28681
>             Project: Hive
>          Issue Type: Wish
>            Reporter: yongzhi.shao
>            Priority: Minor
>
> Currently, we have found that in many cases, due to HIVE defaulting to enable 
> hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well. 
> Especially when users employ the DISTINCT keyword, the situation tends to get 
> worse. Should we consider disabling this option by default to better handle 
> data skew scenarios for users?Or do we have a better way of tuning this 
> configuration item?
> Additionally, we have noticed that this issue seems to have been discussed in 
> the community mailing list::[Re: Blog article 'Performance Tuning for 
> Single-table Queries'-Apache Mail 
> Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (HIVE-28681) Default to disable hive.optimize.reducededuplication

Reply via email to