[
https://issues.apache.org/jira/browse/HIVE-22690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Madhusoodan reassigned HIVE-22690:
----------------------------------
Assignee: Madhusoodan
> When the directories from HDFS are deleted while running MSCK it fails with
> FileNotFoundException
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-22690
> URL: https://issues.apache.org/jira/browse/HIVE-22690
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Affects Versions: 2.1.1
> Reporter: Madhusoodan
> Assignee: Madhusoodan
> Priority: Major
>
> Assume a table `emp` defined as follows
>
> {code:java}
> create external table
> emp (id int, name string)
> partitioned by
> (dept string)
> location
> 'hdfs://namenode.com:8020/hive/data/db/emp'
> ;{code}
> Create say 1000 partitions in the HDFS
>
> Now to synchronize the MetaStore, if we run the MSCK command and parallely
> delete the HDFS directories, at some point MSCK fails with
> FieNotFoundException. Here is the stack trace.
>
> {code:java}
> 2019-12-10 23:21:50,027 WARN hive.ql.exec.DDLTask:
> [HiveServer2-Background-Pool: Thread-500224]: Failed to run metacheck:
> org.apache.hadoop.hive.ql.metadata.HiveException:
> java.io.FileNotFoundException: File
> hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:554)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:443)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:334)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:310)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:253)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:118)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1862)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:413)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334)
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
> [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
> [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
> [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at java.security.AccessController.doPrivileged(Native Method)
> ~[?:1.8.0_121]
> at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> [hadoop-common-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
> [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [?:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> [?:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [?:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [?:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> Caused by: java.io.FileNotFoundException: File
> hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:985)
> ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121)
> ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1045)
> ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1042)
> ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1052)
> ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1853)
> ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1895)
> ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:474)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:467)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> at
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:448)
> ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
> ... 4 more
> {code}
> I analyzed the stack trace and found that the problem is in class
> HiveMetaStoreChecker::processPathDepthInfo [1]
>
> What we are doing here is
> # Create a Q
> # Put the table's data directory in the Q
> # Start few threads which explore the directories in Q and add the newly
> discovered ones to the Q.
> This process has a flaw. Say there are 1000 first level directories and
> 1000*500 second level directories, then we can prove that there exists
> sufficient amount of time between putting a path in the Q and exploring the
> content of the same directory. This time is large enough to do a HDFS delete
> and if done so results in the above failure.
>
> What can be the improvement.
> # [best according to me] Consume the exception and may be print it in DEBUG
> mode
> # Check the existence of the directory before listing the content in it.
>
> References:
> [1]
> https://github.com/apache/hive/blob/01faca2f9d7dcb0f5feabfcb07fa5ea12b79c5b9/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L474
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)