[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420365#comment-15420365
 ] 

Michael J. Carey commented on ASTERIXDB-1556:
---------------------------------------------

I think we also need to remember that we are not dealing with the main case 
here - the urgency is getting the main case working right in terms of 
accounting for all memory (hash table included) as well as for data memory - 
ideally, frame memory use and other auxiliary memory use - so that operators 
like this one live within their designated means (their budget).

Garbage collection because the hash table's space usage (in terms of used space 
being more than some fraction over needed space, indicating waste) is then 
something that seems like it'll be rare - so this is the rare case - usually 
space issues (in terms of the need to stay in-budget) will be due to data frame 
usage and resolvable by data frame spilling - the hash table will just be 
(accounted for) overhead.  The issue will be when somewhat pathological things 
happen that run counter to its allocation pattern - and lead us into GC.  Then 
we need to do something here.

My naive / non-thought-through "1 extra frame" approach would basically be to 
create a whole new hash table for the data that's in memory - my intuition 
still tells me this should be possible.  Very very roughly it seems like one 
would iterate over the hash table "from the top" to do this.  This could 
require a bunch of copying, not sure, and I'm not sure how cache friendly it 
will be - probably random source access, sequential target access - but seems 
doable.  (Again, though, I'm typing this without looking/thinking carefully 
again about the precise structure.)  BTW, it would be okay if 1 extra frame 
were 2 or 3 extra frames - as long as the copying process were only to need a 
very small CONSTANT number of extra frames for the duration of GC, that'd be 
acceptable.

Bottom lines:
 - Let's do something and make priority #1 the accurate accounting for the 
purposes of being budget-wise and also knowing what the garbage ratio in the 
hash table is (so we can get a sense of how that looks).
 - Let's run a variety of tests on that something to see its behavior, and 
let's be sure to include cases where the average group size is similar to the 
data size (i.e., group by something almost unique), a small number like 4 
(e.g., TPC-H case), and a few numbers in between.
 - Let's compare perf and memory used for that against a comparably configured 
before-changes build?  (Points of comparison would be the before-system with 
the same official budget - knowing it will cheat - and the before-system with a 
lower official budget - to make its use be roughly the same in reality.)
How's that sound?

> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
>
>
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 
> 2), the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is 
> translated into massive number of operators (more than 200 operators in the 
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external 
> hash group by doesn't conform to the frame limit. So, an out of memory 
> exception happens during the execution of an external hash group by operator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to