[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420138#comment-15420138
]
Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------
Regarding garbage collection of Hash Table, [~dtabass] suggested the following
nice idea.
{quote}
Using one extra frame, make a sequential pass thru the hash table to create a
new hash table without any holes.
{quote}
The issue is that a hash slot in content frame doesn't contain the original
header location in header frames. For instance, the first slot in the content
frame #0 doesn't mean that the slot is for h(0) since we gradually allocate a
content frame once we see a new hash value().
So, using an extra frame, it's possible to coalesce slots in one content frame,
but it will ruin the structure because we can't change the corresponding header
location. For example, if the first slot in the content frame #0 is for the
h(92), then we need to reflect the location change of the first slot (new
frame, new offset) in the h(92) location in the header frame. But, by checking
the content frame alone, we don't know the slot is for h(92).
Using each header frame and going through different content frames (since the
slots for h() values in the same header frame can be placed all over the places
depends on the time of h() value insertion) and construct a new content frame
is another way. But, this requires a lot of random access and one extra frame
might not be enough.
So, the structure of the hash slot may be changed to contain the original
header location (2 int - header frame index and offset). So, each header
location has a pointer to the hash slot and each hash slot also has pointer to
the header location in the header frame.
Again, the structure of SpillableTale used in external hash group-by:
https://docs.google.com/presentation/d/1AExoTqQlx9va-AaiZ6OSPxBuQ3NJqz-cG5NGrjdk5FU/edit?usp=sharing
> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external
> hash group by doesn't conform to the frame limit. So, an out of memory
> exception happens during the execution of an external hash group by operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)