[
https://issues.apache.org/jira/browse/TAJO-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13636148#comment-13636148
]
Hyunsik Choi commented on TAJO-35:
----------------------------------
The current Tajo has three join physical operators, such as BNLJoinExec,
HashJoinExec, and MergeJoinExec. HashJoinExec builds a in-memory hash table for
the smaller relation, and for each tuple in the larger relation it probes the
hash table. It works very well and fast in many cases.
However, it doesn't work if the smaller relation does not fit in memory. In
such a case, now we use ExternalSortExec and then MergeJoinExec. It also works
well, but external sort is very costly.
Hybrid hash join (http://en.wikipedia.org/wiki/Hash_join#Hybrid_hash_join) is a
well-known join algorithm. It is a variation of Grace hash join algorithm.
Grace hash join algorithm works as follows:
Given two relations R and S to be joined,
1. Grace hash join algorithm partitions R into N buckets by hashing join keys
in order to fit each bucket in memory
2. it also partitions S into N buckets by hashing join keys
3. for each pair of [i] buckets, 1 <= i <= N, Grace hash joins builds a
in-memory hash table for bucket [i] of S relation, and probes the hash table
for each tuple in bucket [i] of R relation.
Hybrid hash join just keeps the first bucket of S in memory and probes the
first bucket of R directly in order to avoid the write and read of the first
bucket pair.
You can find many other materials (e.g., PPTs, book chapters, and papers) for
hybrid hash join algorithm via google.
Thanks!
> Hybrid Hash Join Operator
> -------------------------
>
> Key: TAJO-35
> URL: https://issues.apache.org/jira/browse/TAJO-35
> Project: Tajo
> Issue Type: New Feature
> Components: physical operator
> Reporter: Hyunsik Choi
> Labels: gsoc, gsoc2013, mentor
>
> The hybrid hash join algorithm is well known as the best join algorithm which
> takes advantage of more available memory. The goal of this proposal is to
> implement HybridHashJoinExec class subclassed from PhysicalExec. Its original
> literature is "Join processing in database systems with large main memories"
> (http://www.cs.ucr.edu/~tsotras/cs236/W13/join.pdf). In addition, you could
> find many other materials available in the web.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira