[
https://issues.apache.org/jira/browse/HIVE-28681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
yongzhi.shao updated HIVE-28681:
--------------------------------
Description:
Currently, we have found that in many cases, due to HIVE defaulting to enable
hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well.
Especially when users employ the DISTINCT keyword, the situation tends to get
worse. Should we consider disabling this option by default to better handle
data skew scenarios for users?Or do we have a better way of tuning this
configuration item?
Additionally, we have noticed that this issue seems to have been discussed in
the community mailing list::[Re: Blog article 'Performance Tuning for
Single-table Queries'-Apache Mail
Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]
was:
Currently, we have found that in many cases, due to HIVE defaulting to enable
hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well.
Especially when users employ the DISTINCT keyword, the situation tends to get
worse. Should we consider disabling this option by default to better handle
data skew scenarios for users?
Additionally, we have noticed that this issue seems to have been discussed in
the community mailing list::[Re: Blog article 'Performance Tuning for
Single-table Queries'-Apache Mail
Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]
> Default to disable hive.optimize.reducededuplication
> ----------------------------------------------------
>
> Key: HIVE-28681
> URL: https://issues.apache.org/jira/browse/HIVE-28681
> Project: Hive
> Issue Type: Wish
> Reporter: yongzhi.shao
> Priority: Minor
>
> Currently, we have found that in many cases, due to HIVE defaulting to enable
> hive.optimize.reducededuplication, TEZ-TASKs are not handling data skew well.
> Especially when users employ the DISTINCT keyword, the situation tends to get
> worse. Should we consider disabling this option by default to better handle
> data skew scenarios for users?Or do we have a better way of tuning this
> configuration item?
> Additionally, we have noticed that this issue seems to have been discussed in
> the community mailing list::[Re: Blog article 'Performance Tuning for
> Single-table Queries'-Apache Mail
> Archives|https://lists.apache.org/thread/93836do7hsrod5f3go3pxtqo8h82cvns]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)