[jira] [Commented] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size

Gopal V (JIRA) Mon, 20 Jun 2016 15:08:15 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340503#comment-15340503
 ]


Gopal V commented on HIVE-13809:
--------------------------------

LGTM - +1.

The bloom filter sizing needs a revisit, since this is pre-allocated based on 
estimates, not on real row-counts - allowing more false positives at higher 
cardinalities, to keep the memory utilization under check.

> Hybrid Grace Hash Join memory usage estimation didn't take into account the 
> bloom filter size
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13809
>                 URL: https://issues.apache.org/jira/browse/HIVE-13809
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Wei Zheng
>            Assignee: Wei Zheng
>         Attachments: HIVE-13809.1.patch
>
>
> Memory estimation is important during hash table loading, because we need to 
> make the decision of whether to load the next hash partition in memory or 
> spill it. If the assumption is there's enough memory but it turns out not the 
> case, we will run into OOM problem.
> Currently hybrid grace hash join memory usage estimation didn't take into 
> account the bloom filter size. In large test cases (TB scale) the bloom 
> filter grows as big as hundreds of MB, big enough to cause estimation error.
> The solution is to count in the bloom filter size into memory estimation.
> Another issue this patch will fix is possible NPE due to object cache reuse 
> during hybrid grace hash join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size

Reply via email to