[
https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340503#comment-15340503
]
Gopal V commented on HIVE-13809:
--------------------------------
LGTM - +1.
The bloom filter sizing needs a revisit, since this is pre-allocated based on
estimates, not on real row-counts - allowing more false positives at higher
cardinalities, to keep the memory utilization under check.
> Hybrid Grace Hash Join memory usage estimation didn't take into account the
> bloom filter size
> ---------------------------------------------------------------------------------------------
>
> Key: HIVE-13809
> URL: https://issues.apache.org/jira/browse/HIVE-13809
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.0.0, 2.1.0
> Reporter: Wei Zheng
> Assignee: Wei Zheng
> Attachments: HIVE-13809.1.patch
>
>
> Memory estimation is important during hash table loading, because we need to
> make the decision of whether to load the next hash partition in memory or
> spill it. If the assumption is there's enough memory but it turns out not the
> case, we will run into OOM problem.
> Currently hybrid grace hash join memory usage estimation didn't take into
> account the bloom filter size. In large test cases (TB scale) the bloom
> filter grows as big as hundreds of MB, big enough to cause estimation error.
> The solution is to count in the bloom filter size into memory estimation.
> Another issue this patch will fix is possible NPE due to object cache reuse
> during hybrid grace hash join.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)