Taewoo Kim commented on ASTERIXDB-1556:

@[~buyingyi]: If we assume that each field consists of at least 4 bytes, that 
means there are 32 bits. The max value of INT is 2^31 -1, which is greater than 
32MB (2^25), 64MB (2^26), or 128MB (2^27). What I'm trying to say here is that 
for a reasonable budget size, we always choose the number of possible tuples in 
data table rather than considering both (# of tuples, # of hash entries). I 
think the formula might be reduced to y = 32M / (8+4+1) * 40 to calculate the 
expected byte size of hash table. Then y / (32 + y) to get the actual ratio. 
How's your thought? 

> Hash Table used by External hash group-by doesn't conform to the budget.
> ------------------------------------------------------------------------
>                 Key: ASTERIXDB-1556
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1556
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>            Priority: Critical
>              Labels: soon
>         Attachments: 2wayjoin.pdf, 2wayjoin.rtf, 2wayjoinplan.rtf, 
> 3wayjoin.pdf, 3wayjoin.rtf, 3wayjoinplan.rtf
> When we enable prefix-based fuzzy-join and apply the multi-way fuzzy-join ( > 
> 2), the system generates an out-of-memory exception. 
> Since a fuzzy-join is created using 30-40 lines of AQL codes and this AQL is 
> translated into massive number of operators (more than 200 operators in the 
> plan for a 3-way fuzzy join), it could generate out-of-memory exception.
> /// Update: as the discussion goes, we found that hash table in the external 
> hash group by doesn't conform to the frame limit. So, an out of memory 
> exception happens during the execution of an external hash group by operator.

This message was sent by Atlassian JIRA

Reply via email to