[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412200#comment-15412200
]
Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------
One more thing regarding the hash table size (the number of unique h() values):
[~dtabass] suggested that we can use BigInteger.nextProbablePrime(). Since each
hash pointer in the header of Hash table consists of 8 bytes (2 int - frame
index, offset), the number of h() values in a frame is frameSize / 8. So, the
range N is frameSize / 8 * #maximum frame. I would like to suggest that we find
a prime number between 0.8N < x < 0.9N since if x is closer to N, then
eventually hash table itself can occupy whole frames and there will not be
enough spaces for saving actual tuples. A weak point of here is that we can't
assume that 0.8 and 0.9 are good range. It just makes sure that there is no
100% occupancy from the hash table side.
> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external
> hash group by doesn't conform to the frame limit. So, an out of memory
> exception happens during the execution of an external hash group by operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)