Tsuyoshi OZAWA created TEZ-1806: ----------------------------------- Summary: Out of Memory with large TEZ_RUNTIME_IO_SORT_MB Key: TEZ-1806 URL: https://issues.apache.org/jira/browse/TEZ-1806 Project: Apache Tez Issue Type: Bug Reporter: Tsuyoshi OZAWA
When I allocated 4GB for size of each container and 1.5GB for TEZ_RUNTIME_IO_SORT_MB, it failed with OoM. I think it's better to decide the value of TEZ_RUNTIME_IO_SORT_MB automatically based on the size of containers. ``` 14/11/28 03:50:00 INFO tez.DAGBuilder: DAG execution complete 14/11/28 03:50:00 ERROR tez.DAGBuilder: DAG diagnostics: [Vertex failed, vertexName=2, vertexId=vertex_1417036912823_0055_1_01, diagnostics=[Task failed, taskId=task_1417036912823_0055_1_01_000003, diagnostics=[TaskAttempt 0 failed, info=[Error: Fatal Error cause TezChild exit.:java.lang.OutOfMemoryError: Java heap space at org.apache.tez.runtime.library.common.sort.impl.dflt.DefaultSorter.<init>(DefaultSorter.java:140) at org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput.start(OrderedPartitionedKVOutput.java:114) at org.apache.tez.runtime.library.processor.SimpleProcessor.preOp(SimpleProcessor.java:78) at org.apache.tez.runtime.library.processor.SimpleProcessor.run(SimpleProcessor.java:52) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332)