[ https://issues.apache.org/jira/browse/HIVE-22001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900300#comment-16900300 ]
Jason Dere commented on HIVE-22001: ----------------------------------- Looks like HIVE-21225 did not include fixes for this issue after all. > AcidUtils.getAcidState() can fail if Cleaner is removing files at the same > time > ------------------------------------------------------------------------------- > > Key: HIVE-22001 > URL: https://issues.apache.org/jira/browse/HIVE-22001 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Jason Dere > Assignee: Jason Dere > Priority: Major > > Had one user hit the following error during getSplits > {noformat} > 2019-07-06T14:33:03,067 ERROR [4640181a-3eb7-4b3e-9a40-d7a8de9a570c > HiveServer2-HttpHandler-Pool: Thread-415519]: SessionState > (SessionState.java:printError(1247)) - Vertex failed, vertexName=Map 1, > vertexId=vertex_1560947172646_2452_6199_00, diagnostics=[Vertex > vertex_1560947172646_2452_6199_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: hive_table initializer failed, > vertex=vertex_1560947172646_2452_6199_00 [Map 1], java.lang.RuntimeException: > ORC split generation failed with exception: java.io.FileNotFoundException: > File hdfs://path/to/hive_table/oiddateyyyymmdd=20190706/delta_0987070_0987070 > does not exist. > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1870) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1958) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.concurrent.ExecutionException: > java.io.FileNotFoundException: File > hdfs://path/to/hive_table/oiddateyyyymmdd=20190706/delta_0987070_0987070 does > not exist. > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1809) > ... 17 more > Caused by: java.io.FileNotFoundException: File > hdfs://path/to/hive_table/oiddateyyyymmdd=20190706/delta_0987070_0987070 does > not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:1059) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:131) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1119) > at > org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1116) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1126) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1868) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1953) > at > org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.chooseFile(AcidUtils.java:1903) > at > org.apache.hadoop.hive.ql.io.AcidUtils$MetaDataFile.isRawFormat(AcidUtils.java:1913) > at > org.apache.hadoop.hive.ql.io.AcidUtils.parsedDelta(AcidUtils.java:947) > at > org.apache.hadoop.hive.ql.io.AcidUtils.parseDelta(AcidUtils.java:935) > at > org.apache.hadoop.hive.ql.io.AcidUtils.getChildState(AcidUtils.java:1250) > <--- > at > org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:1071) > <--- > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.callInternal(OrcInputFormat.java:1217) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.access$1600(OrcInputFormat.java:1152) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:1189) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator$1.run(OrcInputFormat.java:1186) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:1186) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$FileGenerator.call(OrcInputFormat.java:1152) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > {noformat} > While shortly before this query, the Cleaner thread in the Hive metastore was > deleting the directory mentioned: > {noformat} > 019-07-06T14:32:58,626 INFO [Thread-46]: compactor.Cleaner > (Cleaner.java:removeFiles(344)) - id=12254478 About to remove 1230 obsolete > directories from hdfs://path/to/hive_table/oiddateyyyymmdd=20190706/. > [base_0981109,delete_delta_0981249_0982668,delta_0981249_0982668,delta_0981249_0981249,............,*delta_0987070_0987070*,......,delta_0993101_0993101]compactId2CompactInfoMap.*keySet([12254478])* > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016)