[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15718817#comment-15718817
 ] 

Taewoo Kim commented on ASTERIXDB-1733:
---------------------------------------

Like the ASTERIXDB-1566 case, 

When the build phase of optimize hybrid hash join starts, since we don't have 
the estimation on input size, we use the default size (140GB). The result is 
that we will have so many partitions that is similar to the number of frames 
that were allocated to a join. For example, if we use 192MB as join memory, the 
number of partitions will be 972. As a result, this will generate a lot of 
small chunks that is smaller than 1MB. Rather than having several disk I/Os of 
smaller chunks, more bigger chunk would be preferable. 

I think for the first input (build), like we do to estimate the hash table size 
(cardinality) for hash group-by using MIN(# of possible hash entries, # of 
possible tuples for the given memory budget), the same logic can be applied to 
here. 

> Hash Table used by hash-join doesn't conform to the budget.
> -----------------------------------------------------------
>
>                 Key: ASTERIXDB-1733
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1733
>             Project: Apache AsterixDB
>          Issue Type: Bug
>            Reporter: Taewoo Kim
>            Assignee: Taewoo Kim
>
> The hash table in the hash join doesn't conform to the frame limit 
> (join.memory parameter in the configuration file). This is related to 
> ASTERIXDB-1556.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to