[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415637#comment-15415637
]
Michael J. Carey commented on ASTERIXDB-1556:
---------------------------------------------
The details on the hash table cleanup trigger are roughly as follows:
- Let the sum of all the hash table space be called H.
- When a list is relocated, note the amount of space (garbage) left behind -
let G be the sum of this garbage (hole) space.
- When (H / (H-G)) > 1.g, where g is the fraction of tolerable waste, clean up
the hash table.
The details on the hash table cleanup itself is as follows:
- Using one extra frame, make a sequential pass thru the hash table to create
a new hash table without any holes.
- Write a note to the systems' warning/error log with information about this
process having been done.
I.e., if the hash table is leading to space waste, make it stop wasting space!
(Don't take the problem out on the buckets and cause I/O - not necessary/good
to do that - instead, reclaim the lost space.)
NOTE: If the system is working right and properly configured, this really
should "never" happen in any case - but we might as well handle it well when it
does.
PS: The trigger philosophy here is borrowed from Litwin's design for linear
hashing. (See ~1980 Litwin paper in 222.)
> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external
> hash group by doesn't conform to the frame limit. So, an out of memory
> exception happens during the execution of an external hash group by operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)