[
https://issues.apache.org/jira/browse/HIVE-12084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14983620#comment-14983620
]
Thejas M Nair commented on HIVE-12084:
--------------------------------------
Runtime.getRuntime().freeMemory() does not give the all the memory it could
use, it ignores unallocated memory. This is because the heap size of jvm grows
up to max heap size (-Xmx) as per it needs. totalMemory() gives total heap
space it has allocated, and freeMemory() is the free memory within that.
See http://i.stack.imgur.com/GjuwM.png and
http://stackoverflow.com/questions/3571203/what-is-the-exact-meaning-of-runtime-getruntime-totalmemory-and-freememory
.
So instead of using Runtime.getRuntime().freeMemory() , I think it should use
maxMemory() - totalMemory() + freeMemory()
cc [~gopalv]
> Hive queries with ORDER BY and large LIMIT fails with OutOfMemoryError Java
> heap space
> --------------------------------------------------------------------------------------
>
> Key: HIVE-12084
> URL: https://issues.apache.org/jira/browse/HIVE-12084
> Project: Hive
> Issue Type: Bug
> Reporter: Hari Sankar Sivarama Subramaniyan
> Assignee: Hari Sankar Sivarama Subramaniyan
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12084.1.patch, HIVE-12084.2.patch,
> HIVE-12084.3.patch, HIVE-12084.4.patch
>
>
> STEPS TO REPRODUCE:
> {code}
> CREATE TABLE `sample_07` ( `code` string , `description` string , `total_emp`
> int , `salary` int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS
> TextFile;
> load data local inpath 'sample_07.csv' into table sample_07;
> set hive.limit.pushdown.memory.usage=0.9999;
> select * from sample_07 order by salary LIMIT 999999999;
> {code}
> This will result in
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at org.apache.hadoop.hive.ql.exec.TopNHash.initialize(TopNHash.java:113)
> at
> org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.initializeOp(ReduceSinkOperator.java:234)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.initializeOp(VectorReduceSinkOperator.java:68)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
> at
> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
> {code}
> The basic issue lies with top n optimization. We need a limit for the top n
> optimization. Ideally we would detect that the allocated bytes will be bigger
> than the "limit.pushdown.memory.usage" without trying to alloc it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)