Toshihiko Uchida created HIVE-22373:
---------------------------------------
Summary: File Merge tasks fail when containers are reused
Key: HIVE-22373
URL: https://issues.apache.org/jira/browse/HIVE-22373
Project: Hive
Issue Type: Bug
Affects Versions: 3.1.2
Reporter: Toshihiko Uchida
h1. Problems
Setting tez.am.container.reuse.enabled=true allows for containers to be reused
across multiple tasks.
When two File Merge tasks run on the same container, the last task fails in
renaming the output path.
Below is an error log of the task 000001_0 on the container
container_e87_1570604853053_11564_01_000003, where the task 000004_0 ran before
the task 000001_0.
It shows that the task 000001_0's output file name is taken from the previous
task id 000004_0 mistakenly.
{code}
2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
at
org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close
AbstractFileMergeOperator
at
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
at
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
at
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
... 17 more
Caused by: java.io.IOException: Unable to rename
viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-10000/_tmp.000004_0
to
viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0
at
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
... 20 more
{code}
h1. Causes
When AbstractFileMergeOperator is initialized, taskId is updated only for the
first time.
- AbstractFileMergeOperator.java
{code}
private void updatePaths(Path tp, Path ttp) {
if (taskId == null) {
taskId = Utilities.getTaskId(jc);
}
{code}
It leads to the above conflict of the output file names.
h1. Solutions
Remove the null-checking conditional, which was introduced in HIVE-14640, and
update taskId from JobConf whenever the operator is initialized.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)