[
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15622822#comment-15622822
]
Taewoo Kim commented on ASTERIXDB-1556:
---------------------------------------
The performance comparison between the master branch and this branch:
In short, there is no significant performance degradation if the same amount of
group.memory is given.
On one node with one partition, I loaded 145M records and compared the
performance between the master branch and this branch.
The first query (Q9002) issues a hash group by query on a field that has only 5
distinct values. I used the confidence interval (95%) stats and found that the
range of upper and lower bound of the master branch and this branch overlaps.
For more than 1M groups, the master branch generates an Out Of Memory
Exception. So, I had to reduce the number of tuples (input to the hash-group
by).
So, the next query (Q9004) issues a hash group by query on a field that has
0.5M distinct values. I used the confidence interval (95%) stats and found that
the range of upper and lower bound of the master branch and this branch
overlaps.
Here is the link:
https://docs.google.com/spreadsheets/d/1xV5qvoi8oy0-AnSIGGV3ATNSxoDanqhxX4nol3AageU/edit?usp=sharing
I think it's ready to be merged.
> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
> Key: ASTERIXDB-1556
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
> Project: Apache AsterixDB
> Issue Type: Bug
> Reporter: Taewoo Kim
> Assignee: Taewoo Kim
> Priority: Critical
> Labels: soon
> Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf,
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( >
> 2), the system generates an out-of-memory exception.
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is
> translated into massive number of operators (more than 200 operators in the
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external
> hash group by doesn't conform to the frame limit. So, an out of memory
> exception happens during the execution of an external hash group by operator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)