[
https://issues.apache.org/jira/browse/HIVE-22373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Butao Zhang updated HIVE-22373:
-------------------------------
Description:
h1. Problems
Setting tez.am.container.reuse.enabled=true allows for containers to be reused
across multiple tasks.
When two File Merge tasks run on the same container, the last task fails in
renaming the output path.
Below is an error log of the task 000001_0 on the container
container_e87_1570604853053_11564_01_000003, where the task 000004_0 ran before
the task 000001_0.
It shows that the task 000001_0's output file name is taken from the previous
task id 000004_0 mistakenly.
{code}
2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|:
java.lang.RuntimeException: Hive Runtime Error while closing operators
at
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
at
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
at
org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
at
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close
AbstractFileMergeOperator
at
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
at
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
at
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
... 17 more
Caused by: java.io.IOException: Unable to rename
viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-10000/_tmp.000004_0
to
viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0
at
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
... 20 more
{code}
h1. Causes
When AbstractFileMergeOperator is initialized, taskId is updated only for the
first time.
- AbstractFileMergeOperator.java
{code}
private void updatePaths(Path tp, Path ttp) {
if (taskId == null) {
taskId = Utilities.getTaskId(jc);
}
{code}
It leads to the above conflict of the output file names.
h1. Solutions
Remove the null-checking conditional, which was introduced in HIVE-14640, and
update taskId from JobConf whenever the operator is initialized.
was:
h1. 问题
设置 tez.am.container.reuse.enabled=true 允许在多个任务中重用容器。
当两个文件合并任务在同一个容器上运行时,最后一个任务重命名输出路径失败。
下面是容器container_e87_1570604853053_11564_01_000003上任务000001_0的错误日志,其中任务000004_0在任务000001_0之前运行。
它表明任务 000001_0 的输出文件名错误地取自之前的任务 ID 000004_0。
{code:java}
2019年10月15日13:00:31438 [错误] [TezChild] | tez.TezProcessor
|:了java.lang.RuntimeException:配置单元运行时错误而关闭运营商
在
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
在
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
在
org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
在
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
在
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
在
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
在 java.security.AccessController.doPrivileged(Native Method)
在 javax.security.auth.Subject.doAs(Subject.java:422)
在
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
在
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
在
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
在 org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
在
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptably(TrustedListenableFutureTask.java:108)
在
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
在
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
在
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
在
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
在 java.lang。线程.run(线程.java:748)
引起:org.apache.hadoop.hive.ql.metadata.HiveException:无法关闭
AbstractFileMergeOperator
在
org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
在
org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
在 org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
在
org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
……还有 17 个
引起:java.io.IOException: Unable to rename viewfs:
//<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15470.tmp_task/ext_task
000004_0到viewfs:// <群集名> /用户/ <用户名>
/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0
在org.apache.hadoop.hive.ql。
exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
……还有 20 个
{code}
h1. 原因
AbstractFileMergeOperator 初始化时,taskId 仅在第一次更新。
- AbstractFileMergeOperator.java
{code:java}
private void updatePaths(Path tp, Path ttp ) {
if (taskId == null ) {
taskId = Utilities.getTaskId(jc);
}
{code}
它导致上述输出文件名的冲突。
h1. 解决方案
删除空检查条件,这是在 HIVE-14640,并在操作员初始化时从 JobConf 更新 taskId。
> File Merge tasks fail when containers are reused
> ------------------------------------------------
>
> Key: HIVE-22373
> URL: https://issues.apache.org/jira/browse/HIVE-22373
> Project: Hive
> Issue Type: Bug
> Affects Versions: 3.1.2
> Reporter: Toshihiko Uchida
> Assignee: Toshihiko Uchida
> Priority: Major
> Fix For: 4.0.0-alpha-1
>
> Attachments: HIVE-22373.patch
>
>
> h1. Problems
> Setting tez.am.container.reuse.enabled=true allows for containers to be
> reused across multiple tasks.
> When two File Merge tasks run on the same container, the last task fails in
> renaming the output path.
> Below is an error log of the task 000001_0 on the container
> container_e87_1570604853053_11564_01_000003, where the task 000004_0 ran
> before the task 000001_0.
> It shows that the task 000001_0's output file name is taken from the previous
> task id 000004_0 mistakenly.
> {code}
> 2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|:
> java.lang.RuntimeException: Hive Runtime Error while closing operators
> at
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284)
> at
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
> at
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
> at
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
> at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
> at
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
> at
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close
> AbstractFileMergeOperator
> at
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315)
> at
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265)
> at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733)
> at
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180)
> ... 17 more
> Caused by: java.io.IOException: Unable to rename
> viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-10000/_tmp.000004_0
> to
> viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0
> at
> org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254)
> ... 20 more
> {code}
> h1. Causes
> When AbstractFileMergeOperator is initialized, taskId is updated only for the
> first time.
> - AbstractFileMergeOperator.java
> {code}
> private void updatePaths(Path tp, Path ttp) {
> if (taskId == null) {
> taskId = Utilities.getTaskId(jc);
> }
> {code}
> It leads to the above conflict of the output file names.
> h1. Solutions
> Remove the null-checking conditional, which was introduced in HIVE-14640, and
> update taskId from JobConf whenever the operator is initialized.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)