Toshihiko Uchida created HIVE-22373:
---------------------------------------

             Summary: File Merge tasks fail when containers are reused
                 Key: HIVE-22373
                 URL: https://issues.apache.org/jira/browse/HIVE-22373
             Project: Hive
          Issue Type: Bug
    Affects Versions: 3.1.2
            Reporter: Toshihiko Uchida


h1. Problems
Setting tez.am.container.reuse.enabled=true allows for containers to be reused 
across multiple tasks.
When two File Merge tasks run on the same container, the last task fails in 
renaming the output path.

Below is an error log of the task 000001_0 on the container 
container_e87_1570604853053_11564_01_000003, where the task 000004_0 ran before 
the task 000001_0.
It shows that the task 000001_0's output file name is taken from the previous 
task id 000004_0 mistakenly.
{code}
2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|: 
java.lang.RuntimeException: Hive Runtime Error while closing operators
        at 
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
        at 
org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
        at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close 
AbstractFileMergeOperator
        at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
        at 
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
        at 
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
        ... 17 more
Caused by: java.io.IOException: Unable to rename 
viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-10000/_tmp.000004_0
 to 
viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0
        at 
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
        ... 20 more
{code}

h1. Causes
When AbstractFileMergeOperator is initialized, taskId is updated only for the 
first time.

- AbstractFileMergeOperator.java
{code}
private void updatePaths(Path tp, Path ttp) {
  if (taskId == null) {
    taskId = Utilities.getTaskId(jc);
  }
{code}

It leads to the above conflict of the output file names.

h1. Solutions
Remove the null-checking conditional, which was introduced in HIVE-14640, and 
update taskId from JobConf whenever the operator is initialized.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to