[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799738#comment-15799738
]
ASF subversion and git services commented on ASTERIXDB-1556:
------------------------------------------------------------
Commit 8b2aceeb97c8f89f2898c0b35f38cc36d3cdda63 in asterixdb's branch
refs/heads/master from [~wangsaeu]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=8b2acee ]
ASTERIXDB-1556, ASTERIXDB-1733: Hash Group By and Hash Join conform to the
memory budget
- External Hash Group By and Hash Join now conform to the memory budget
(compiler.groupmemory and compiler.joinmemory)
- For Optimzed Hybrid Hash Join, we calculate the expected hash table size
when the build phase is done and
try to spill one or more partitions if the freespace can't afford the hash
table size.
- For External Hash Group By, the number of hash entries (hash table size) is
calculated based on
an estimation of the aggregated tuple size and possible hash values for the
given field size in that tuple.
- Garbage Collection feature has been added to SerializableHashTable. For
external hash group-by,
whenever we spill a data partition to the disk, we also check the ratio of
garbage in the hash table.
If it's greater than the given threshold, we conduct a GC on Hash Table.
Change-Id: I2b323e9a2141b4c1dd1652a360d2d9354d3bc3f5
Reviewed-on: https://asterix-gerrit.ics.uci.edu/1056
Tested-by: Jenkins <[email protected]>
BAD: Jenkins <[email protected]>
Integration-Tests: Jenkins <[email protected]>
Reviewed-by: Yingyi Bu <[email protected]>
> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Priority: Critical
> Labels: soon
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external
> hash group by doesn't conform to the frame limit. So, an out of memory
> exception happens during the execution of an external hash group by operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)