[ https://issues.apache.org/jira/browse/HIVE-9277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14344229#comment-14344229 ]
Wei Zheng commented on HIVE-9277: --------------------------------- Right now I'm using HIVECONVERTJOINNOCONDITIONALTASK as a threshold to do estimation. Once the memory management part is ready, I can rely on that to provide me an exact number. > Hybrid Hybrid Grace Hash Join > ----------------------------- > > Key: HIVE-9277 > URL: https://issues.apache.org/jira/browse/HIVE-9277 > Project: Hive > Issue Type: New Feature > Components: Physical Optimizer > Reporter: Wei Zheng > Assignee: Wei Zheng > Labels: join > Attachments: HIVE-9277.01.patch, HIVE-9277.02.patch, > HIVE-9277.03.patch, HIVE-9277.04.patch, HIVE-9277.05.patch, > HIVE-9277.06.patch, High-leveldesignforHybridHybridGraceHashJoinv1.0.pdf > > > We are proposing an enhanced hash join algorithm called _“hybrid hybrid grace > hash join”_. > We can benefit from this feature as illustrated below: > * The query will not fail even if the estimated memory requirement is > slightly wrong > * Expensive garbage collection overhead can be avoided when hash table grows > * Join execution using a Map join operator even though the small table > doesn't fit in memory as spilling some data from the build and probe sides > will still be cheaper than having to shuffle the large fact table > The design was based on Hadoop’s parallel processing capability and > significant amount of memory available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)