[ 
https://issues.apache.org/jira/browse/IMPALA-10247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17222287#comment-17222287
 ] 

Marta Kuczora commented on IMPALA-10247:
----------------------------------------

Investigated this issue and the reason of the exception was that Hive was 
trying to read an empty manifest file. Manifest files are used in case of 
direct insert to determine which files needs to be kept and which one needs to 
be cleaned up. They are created by the tasks and they use the task attempt Id 
as postfix. In this particular test what happened is that one of the container 
ran out of memory so Tez decided to kill it right after the manifest file got 
created but before the paths got written into the manifest file. This was the 
manifest file for the task attempt 0. Then Tez assigned a new container to the 
task, so a new attempt was made with attemptId=1. This one was successful, and 
wrote the manifest file correctly. But Hive didn't know about this, since this 
out of memory issue got handled by Tez under the hood, so there was no 
exception in Hive, therefore no clean-up in the manifest folder. And when Hive 
is reading the manifest files, it just reads every file from the defined 
folder, so it tried to read the manifest files for attempt 0 and 1 as well.
If there are multiple manifest files with the same name but different 
attemptId, Hive should only read the one with the biggest attempt Id.

Created a HIVE issue about this:

https://issues.apache.org/jira/browse/HIVE-24322

> Data loading of functional-query ORC fails with EOFException
> ------------------------------------------------------------
>
>                 Key: IMPALA-10247
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10247
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Quanlong Huang
>            Assignee: Zoltán Borók-Nagy
>            Priority: Critical
>
> Data loading of functional-query on ORC tables occasionally fails with
> {code:java}
> 16:41:21 Loading custom schemas (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-custom-schemas.log)...
>  
> 16:41:24   Loading custom schemas OK (Took: 0 min 4 sec)
> 16:41:24 Started Loading functional-query data in background; pid 23644.
> 16:41:24 Started Loading TPC-H data in background; pid 23645.
> 16:41:24 Loading functional-query data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-functional-query.log)...
>  
> 16:41:24 Started Loading TPC-DS data in background; pid 23646.
> 16:41:24 Loading TPC-H data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpch.log)...
>  
> 16:41:24 Loading TPC-DS data (logging to 
> /data/jenkins/workspace/impala-asf-master-core-erasure-coding/repos/Impala/logs/data_loading/load-tpcds.log)...
>  
> 16:48:51   Loading workload 'tpch' using exploration strategy 'core' OK 
> (Took: 7 min 27 sec)
> 16:50:53     FAILED (Took: 9 min 29 sec)
> 16:50:53     'load-data functional-query exhaustive' failed. Tail of log: 
> {code}
> This looks similar to IMPALA-9923 but have a different error stacktrace:
> {code:java}
> 2020-10-13T16:50:50,369  INFO [HiveServer2-Background-Pool: Thread-23853] 
> ql.Driver: Executing 
> command(queryId=jenkins_20201013165050_5dc3d632-a5c3-4f85-b2d3-8c1dc6682322): 
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> ......
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_0000001_0/000000_0.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Reading manifest 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_0000001_0/000000_1.manifest
> 2020-10-13T16:50:53,423  INFO [HiveServer2-Background-Pool: Thread-23832] 
> FileOperations: Looking at manifest file: 
> hdfs://localhost:20500/test-warehouse/managed/jointbl_orc_def/_tmp.base_0000001_0/000000_0.manifest
> 2020-10-13T16:50:53,424 ERROR [HiveServer2-Background-Pool: Thread-23832] 
> exec.Task: Job Commit failed with exception 
> 'org.apache.hadoop.hive.ql.metadata.HiveException(java.io.EOFException)'
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.EOFException
>         at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1468)
>         at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:798)
>         at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>         at org.apache.hadoop.hive.ql.exec.Operator.jobClose(Operator.java:803)
>         at org.apache.hadoop.hive.ql.exec.tez.TezTask.close(TezTask.java:627)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:342)
>         at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
>         at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
>         at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357)
>         at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330)
>         at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
>         at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109)
>         at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:721)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:488)
>         at org.apache.hadoop.hive.ql.Driver.run(Driver.java:482)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:225)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:322)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>         at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:340)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.hadoop.hive.ql.exec.Utilities.handleDirectInsertTableFinalPath(Utilities.java:4587)
>         at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.jobCloseOp(FileSinkOperator.java:1462)
>         ... 29 more
>  {code}
> The failed query is
> {code:sql}
> INSERT OVERWRITE TABLE tpcds_orc_def.web_sales
> SELECT * FROM tpcds.web_sales
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to