[
https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222287#comment-17222287
]
Marta Kuczora commented on IMPALA-10247:
----------------------------------------
Investigated this issue and the reason of the exception was that Hive was
trying to read an empty manifest file. Manifest files are used in case of
direct insert to determine which files needs to be kept and which one needs to
be cleaned up. They are created by the tasks and they use the task attempt Id
as postfix. In this particular test what happened is that one of the container
ran out of memory so Tez decided to kill it right after the manifest file got
created but before the paths got written into the manifest file. This was the
manifest file for the task attempt 0. Then Tez assigned a new container to the
task, so a new attempt was made with attemptId=1. This one was successful, and
wrote the manifest file correctly. But Hive didn't know about this, since this
out of memory issue got handled by Tez under the hood, so there was no
exception in Hive, therefore no clean-up in the manifest folder. And when Hive
is reading the manifest files, it just reads every file from the defined
folder, so it tried to read the manifest files for attempt 0 and 1 as well.
If there are multiple manifest files with the same name but different
attemptId, Hive should only read the one with the biggest attempt Id.
Created a HIVE issue about this:
https://issues.apache.org/jira/browse/HIVE-24322
> Data loading of functional-query ORC fails with EOFException
> ------------------------------------------------------------
>
> Key: IMPALA-10247
> URL: https://issues.apache.org/jira/browse/IMPALA-10247
> Project: IMPALA
> Issue Type: Bug
> Reporter: Quanlong Huang
> Assignee: Zoltán Borók-Nagy
> Priority: Critical
>
> Data loading of functional-query on ORC tables occasionally fails with
> {code:java}
> 16:41:21 Loading custom schemas (logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)...
>
> 16:41:24 Loading custom schemas OK (Took: 0 min 4 sec)
> 16:41:24 Started Loading functional-query data in background; pid 23644.
> 16:41:24 Started Loading TPC-H data in background; pid 23645.
> 16:41:24 Loading functional-query data (logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)...
>
> 16:41:24 Started Loading TPC-DS data in background; pid 23646.
> 16:41:24 Loading TPC-H data (logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)...
>
> 16:41:24 Loading TPC-DS data (logging to
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)...
>
> 16:48:51 Loading workload 'tpch' using exploration strategy 'core' OK
> (Took: 7 min 27 sec)
> 16:50:53 FAILED (Took: 9 min 29 sec)
> 16:50:53 'load-data functional-query exhaustive' failed. Tail of log:
> {code}
> This looks similar to IMPALA-9923 but have a different error stacktrace:
> {code:java}
> 2020-10-13T16:50:50,369 INFO [HiveServer2-Background-Pool: Thread-23853]
> ql.Driver: Executing
> command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322):
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> ......
> 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832]
> FileOperations: Reading manifest
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_0000001_0/000000_0.manifest
> 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832]
> FileOperations: Reading manifest
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_0000001_0/000000_1.manifest
> 2020-10-13T16:50:53,423 INFO [HiveServer2-Background-Pool: Thread-23832]
> FileOperations: Looking at manifest file:
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_0000001_0/000000_0.manifest
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832]
> exec.Task: Job Commit failed with exception
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
> at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
> at
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
> at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
> at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
> at
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
> at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
> at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
> ... 29 more
> {code}
> The failed query is
> {code:sql}
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> {code}
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]