zhengchenyu created HIVE-26179:
----------------------------------
Summary: In tez reuse container mode, asyncInitOperations are not
clear.
Key: HIVE-26179
URL: https://issues.apache.org/jira/browse/HIVE-26179
Project: Hive
Issue Type: Bug
Components: Hive, Tez
Affects Versions: 1.2.1
Environment: engine: Tez (Note: tez.am.container.reuse.enabled is true)
Reporter: zhengchenyu
Assignee: zhengchenyu
Fix For: 4.0.0
In our cluster, we found error like this.
{code}
Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11,
diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422,
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task (
failure ) :
attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing
operators
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
... 16 more
Caused by: java.lang.NullPointerException
at
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
at
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
... 17 more
{code}
When tez reuse container is enable, and use MapJoinOperator, if same tasks's
different taskattemp execute in same container, will throw NPE.
By my debug, I found the second task attempt use first task's
asyncInitOperations. asyncInitOperations are not clear when close op, then
second taskattemp may use first taskattepmt's mapJoinTables which
HybridHashTableContainer.HashPartition is closed, so throw NPE.
We must clear asyncInitOperations when op is closed.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)