[
https://issues.apache.org/jira/browse/HIVE-17783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225678#comment-16225678
]
Wei Zheng commented on HIVE-17783:
----------------------------------
[~Ferd] Sorry for the late reply. Yes the spilling part is the bottleneck and
there's no easy way to get around it. In your case for the n-way joins, the
optimizer stats estimation may not be accurate which makes the situation worse.
Anyway, the ultimate way to solve this problem is to have a reliable memory
manager which can provide memory usage/quota at any moment. Right now we're
following a conservative approach, which is to use a soft (possibly inaccurate)
memory limit. That way we can avoid unnecessary spilling if there is enough
memory for loading the hashtable.
> Hybrid Grace Hash Join has performance degradation for N-way join using Hive
> on Tez
> -----------------------------------------------------------------------------------
>
> Key: HIVE-17783
> URL: https://issues.apache.org/jira/browse/HIVE-17783
> Project: Hive
> Issue Type: Bug
> Affects Versions: 2.2.0
> Environment: 8*Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
> 1 master + 7 workers
> TPC-DS at 3TB data scales
> Hive version : 2.2.0
> Reporter: Ferdinand Xu
> Attachments: Hybrid_Grace_Hash_Join.xlsx, screenshot-1.png
>
>
> Most configurations are using default value. And the benchmark is to test
> enabling against disabling hybrid grace hash join using TPC-DS queries at 3TB
> data scales. Many queries related to N-way join has performance degradation
> over three times test. Detailed result is attached.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)