[ https://issues.apache.org/jira/browse/HIVE-15565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824756#comment-15824756 ]
Rajesh Balamohan commented on HIVE-15565: ----------------------------------------- As per old impl, maxMemory in {{GroupByOperator}} was 5,312,784,896. After running few queries (e.g q22,18, 70, 67) heap usage increases to ~64GB. Now when q22 is rerun, {{GroupByOperator::shouldBeFlushed}} would always return true for every record due to the following calculation {noformat} usedMemory = isLlap ? usedMemory / numExecutors : usedMemory; rate = (float) usedMemory / (float) maxMemory; <==== 64/12 = 5.33 GB / 5312784896 bytes > memoryThreshold of 0.9. if(rate > memoryThreshold){ return true; } {noformat} Even though q22 dataset is very small, it would end up flushing for every row. Patch fixes the assumption for isTez/isLLAP. But we still need HIVE-15508 for exact memory tracking. > LLAP: GroupByOperator flushes hash table too frequently > ------------------------------------------------------- > > Key: HIVE-15565 > URL: https://issues.apache.org/jira/browse/HIVE-15565 > Project: Hive > Issue Type: Bug > Components: llap > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-15565.1.patch > > > {{GroupByOperator::isTez}} would be true in LLAP mode. Current memory > computations can go wrong with {{isTez}} checks in {{GroupByOperator}}. For > e.g, in a LLAP instance with Xmx128G and 12 executors, it would start > flushing hash table for every record once it reaches around 42GB > (hive.tez.container.size=7100, hive.map.aggr.hash.percentmemory=0.5). > {noformat} > 2017-01-08T23:40:21,339 INFO [TezTaskRunner > (1480722417364_1922_7_03_000004_1)] > org.apache.hadoop.hive.ql.exec.GroupByOperator: Hash Table flushed: new size > = 0 > 2017-01-08T23:40:21,339 INFO [TezTaskRunner > (1480722417364_1922_7_03_000012_1)] > org.apache.hadoop.hive.ql.exec.GroupByOperator: Hash Table flushed: new size > = 0 > 2017-01-08T23:40:21,339 INFO [TezTaskRunner > (1480722417364_1922_7_03_000004_1)] > org.apache.hadoop.hive.ql.exec.GroupByOperator: Hash Tbl flush: #hash table = > 1 > 2017-01-08T23:40:21,339 INFO [TezTaskRunner > (1480722417364_1922_7_03_000012_1)] > org.apache.hadoop.hive.ql.exec.GroupByOperator: Hash Tbl flush: #hash table = > 1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)