[ 
https://issues.apache.org/jira/browse/HIVE-22690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Madhusoodan reassigned HIVE-22690:
----------------------------------

    Assignee: Madhusoodan

> When the directories from HDFS are deleted while running MSCK it fails with 
> FileNotFoundException
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-22690
>                 URL: https://issues.apache.org/jira/browse/HIVE-22690
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.1
>            Reporter: Madhusoodan
>            Assignee: Madhusoodan
>            Priority: Major
>
> Assume a table `emp` defined as follows
>  
> {code:java}
> create external table 
>     emp (id int, name string) 
> partitioned by 
>     (dept string)
> location
>     'hdfs://namenode.com:8020/hive/data/db/emp'
> ;{code}
> Create say 1000 partitions in the HDFS
>  
> Now to synchronize the MetaStore, if we run the MSCK command and parallely 
> delete the HDFS directories, at some point MSCK fails with 
> FieNotFoundException. Here is the stack trace.
>  
> {code:java}
> 2019-12-10 23:21:50,027 WARN  hive.ql.exec.DDLTask: 
> [HiveServer2-Background-Pool: Thread-500224]: Failed to run metacheck: 
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.FileNotFoundException: File 
> hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:554)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkPartitionDirs(HiveMetaStoreChecker.java:443)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.findUnknownPartitions(HiveMetaStoreChecker.java:334)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:310)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkTable(HiveMetaStoreChecker.java:253)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker.checkMetastore(HiveMetaStoreChecker.java:118)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.exec.DDLTask.msck(DDLTask.java:1862) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:413) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2200) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1843) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1563) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1339) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1334) 
> [hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256)
>  [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation.access$600(SQLOperation.java:92)
>  [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:345)
>  [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at java.security.AccessController.doPrivileged(Native Method) 
> ~[?:1.8.0_121]
>       at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_121]
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
>  [hadoop-common-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:357)
>  [hive-service-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [?:1.8.0_121]
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> [?:1.8.0_121]
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_121]
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_121]
>       at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
> Caused by: java.io.FileNotFoundException: File 
> hdfs://namenode.com:8020/hive/data/db/emp/dept=CS does not exist.
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:985)
>  ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$1000(DistributedFileSystem.java:121)
>  ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1045)
>  ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$24.doCall(DistributedFileSystem.java:1042)
>  ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>  ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:1052)
>  ~[hadoop-hdfs-client-3.0.0-cdh6.2.1.jar:?]
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1853) 
> ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
>       at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1895) 
> ~[hadoop-common-3.0.0-cdh6.2.1.jar:?]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.processPathDepthInfo(HiveMetaStoreChecker.java:474)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:467)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       at 
> org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfoCallable.call(HiveMetaStoreChecker.java:448)
>  ~[hive-exec-2.1.1-cdh6.2.1.jar:2.1.1-cdh6.2.1]
>       ... 4 more
> {code}
> I analyzed the stack trace and found that the problem is in class 
> HiveMetaStoreChecker::processPathDepthInfo [1]
>  
> What we are doing here is
>  # Create a Q
>  # Put the table's data directory in the Q
>  # Start few threads which explore the directories in Q and add the newly 
> discovered ones to the Q.
> This process has a flaw. Say there are 1000 first level directories and 
> 1000*500 second level directories, then we can prove that there exists 
> sufficient amount of time between putting a path in the Q and exploring the 
> content of the same directory. This time is large enough to do a HDFS delete 
> and if done so results in the above failure.
>  
> What can be the improvement.
>  # [best according to me] Consume the exception and may be print it in DEBUG 
> mode
>  # Check the existence of the directory before listing the content in it.
>  
> References:
> [1] 
> https://github.com/apache/hive/blob/01faca2f9d7dcb0f5feabfcb07fa5ea12b79c5b9/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStoreChecker.java#L474
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to