[ 
https://issues.apache.org/jira/browse/HADOOP-18662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17703162#comment-17703162
 ] 

ASF GitHub Bot commented on HADOOP-18662:
-----------------------------------------

steveloughran commented on code in PR #5477:
URL: https://github.com/apache/hadoop/pull/5477#discussion_r1143256689


##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java:
##########
@@ -1557,6 +1560,33 @@ public void testListFiles() throws IOException {
     }
   }
 
+  @Test
+  public void testListFilesRecursive() throws IOException {
+    Configuration conf = getTestConfiguration();
+
+    try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build();) {
+      DistributedFileSystem fs = cluster.getFileSystem();
+
+      // Create some directories and files.
+      Path dir = new Path("/dir");
+      Path subDir1 = fs.makeQualified(new Path(dir, "subDir1"));
+      Path subDir2 = fs.makeQualified(new Path(dir, "subDir2"));
+
+      fs.mkdirs(subDir1);

Review Comment:
   you don't need this...create does it for you



##########
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDistributedFileSystem.java:
##########
@@ -1557,6 +1560,33 @@ public void testListFiles() throws IOException {
     }
   }
 
+  @Test
+  public void testListFilesRecursive() throws IOException {
+    Configuration conf = getTestConfiguration();
+
+    try (MiniDFSCluster cluster = new MiniDFSCluster.Builder(conf).build();) {
+      DistributedFileSystem fs = cluster.getFileSystem();
+
+      // Create some directories and files.
+      Path dir = new Path("/dir");
+      Path subDir1 = fs.makeQualified(new Path(dir, "subDir1"));
+      Path subDir2 = fs.makeQualified(new Path(dir, "subDir2"));
+
+      fs.mkdirs(subDir1);
+      fs.mkdirs(subDir2);
+      fs.create(new Path(dir, "foo1")).close();
+      fs.create(new Path(dir, "foo2")).close();
+      fs.create(new Path(subDir1, "foo3")).close();
+      fs.create(new Path(subDir2, "foo4")).close();
+
+      // Mock the filesystem, and throw FNF when listing is triggered for the 
subdirectory.
+      FileSystem mockFs = spy(fs);
+      Mockito.doThrow(new 
FileNotFoundException("")).when(mockFs).listLocatedStatus(eq(subDir1));
+      List<LocatedFileStatus> str = 
RemoteIterators.toList(mockFs.listFiles(dir, true));
+      Assert.assertEquals(str.toString(), 3, str.size());

Review Comment:
   use AssertJ.assertThat(str).hasSize(3) for the full list dump on mismatch



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:
##########
@@ -2413,8 +2413,13 @@ private void handleFileStat(LocatedFileStatus stat) 
throws IOException {
         if (stat.isFile()) { // file
           curFile = stat;
         } else if (recursive) { // directory
-          itors.push(curItor);
-          curItor = listLocatedStatus(stat.getPath());
+          try {
+            RemoteIterator<LocatedFileStatus> newDirItor = 
listLocatedStatus(stat.getPath());
+            itors.push(curItor);
+            curItor = newDirItor;
+          } catch (FileNotFoundException ignored) {
+            LOGGER.debug("Directory {} deleted while attempting to recusive 
listing", stat.getPath());

Review Comment:
   nit: spelling of recursive



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java:
##########
@@ -2413,8 +2413,13 @@ private void handleFileStat(LocatedFileStatus stat) 
throws IOException {
         if (stat.isFile()) { // file
           curFile = stat;
         } else if (recursive) { // directory
-          itors.push(curItor);
-          curItor = listLocatedStatus(stat.getPath());
+          try {
+            RemoteIterator<LocatedFileStatus> newDirItor = 
listLocatedStatus(stat.getPath());

Review Comment:
   do you need to handle the condition where the dir has been deleted and 
replaced with a file? as it is the other concurrency failure, isn't it?





> ListFiles with recursive fails with FNF
> ---------------------------------------
>
>                 Key: HADOOP-18662
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18662
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>
> Problem triggers in HDFS, but the change is in Hadoop-Common, Since the 
> listFiles is defined in Hadoop-Common.
> Scenario:
> ListFiles With recursive: 
>  * Fetches a dir say /dir, which has some /dir/s1...s10
>  * Recursive is set to true: It goes and tries on say /dir/s5 and /dir/s5 got 
> deleted by that time
>  * The entire operation fails with FNF
> Hive Cleaner uses listFiles with recursive true and this impacts that
> {noformat}
> 2023-03-06 07:45:48,331 ERROR 
> org.apache.hadoop.hive.ql.txn.compactor.Cleaner: 
> [Cleaner-executor-thread-12]: Caught exception when cleaning, unable to 
> complete cleaning of 
> id:39762523,dbname:test,tableName:test_table,partName:null,state:,type:MINOR,enqueueTime:0,start:0,properties:null,runAs:hive,tooManyAborts:false,hasOldAbort:false,highestWriteId:989,errorMessage:null,workerId:
>  null,initiatorId: null java.io.FileNotFoundException: File 
> hdfs:/cluster/warehouse/tablespace/managed/hive/test.db/test_table/.hive-staging_hive_2023-03-06_07-45-23_120_4659605113266849995-73550
>  does not exist.
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1275)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>(DistributedFileSystem.java:1249)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1194)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1190)
>     at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus(DistributedFileSystem.java:1208)
>     at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:2144)
>     at org.apache.hadoop.fs.FileSystem$5.handleFileStat(FileSystem.java:2332)
>     at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2309)
>     at 
> org.apache.hadoop.util.functional.RemoteIterators$WrappingRemoteIterator.sourceHasNext(RemoteIterators.java:432)
>     at 
> org.apache.hadoop.util.functional.RemoteIterators$FilteringRemoteIterator.fetch(RemoteIterators.java:581)
>     at 
> org.apache.hadoop.util.functional.RemoteIterators$FilteringRemoteIterator.hasNext(RemoteIterators.java:602)
>     at 
> org.apache.hadoop.hive.ql.io.AcidUtils.getHdfsDirSnapshots(AcidUtils.java:1435)
>     at 
> org.apache.hadoop.hive.ql.txn.compactor.Cleaner.removeFiles(Cleaner.java:287)
>     at org.apache.hadoop.hive.ql.txn.compactor.Cleaner.clean(Cleaner.java:214)
>     at 
> org.apache.hadoop.hive.ql.txn.compactor.Cleaner.lambda$run$0(Cleaner.java:114)
>     at 
> org.apache.hadoop.hive.ql.txn.compactor.CompactorUtil$ThrowingRunnable.lambda$unchecked$0(CompactorUtil.java:54)
>     at 
> java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1640)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:750){noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to