Re: [PR] HADOOP-19543. [ABFS][FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across Iterations [hadoop]

via GitHub Thu, 17 Apr 2025 01:05:09 -0700


manika137 commented on code in PR #7614:
URL: https://github.com/apache/hadoop/pull/7614#discussion_r2048453607



##########
hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemListStatus.java:
##########
@@ -532,6 +532,88 @@ public void testEmptyContinuationToken() throws Exception {
         .describedAs("Listing Size Not as expected").hasSize(1);
   }
 
+  /**
+   * Test to verify that listStatus returns the correct file status
+   * after removing duplicates across multiple iterations of list blobs.
+   * Also verifies that in case of non-empty explicit dir,
+   * entry corresponding to marker blob is returned.
+   * @throws Exception if test fails.
+   */
+  @Test
+  public void testDuplicateEntriesAcrossListBlobIterations() throws Exception {
+    AzureBlobFileSystem fs = Mockito.spy(getFileSystem());
+    AzureBlobFileSystemStore store = Mockito.spy(fs.getAbfsStore());
+    store.getAbfsConfiguration().setListMaxResults(1);
+    AbfsClient client = Mockito.spy(store.getClient());
+
+    Mockito.doReturn(store).when(fs).getAbfsStore();
+    Mockito.doReturn(client).when(store).getClient();
+
+    /*
+     * Following entries will be created inside the root path.
+     * 0. /A - implicit directory without any marker blob
+     * 1. /a - marker file for explicit directory
+     * 2. /a/file1 - normal file inside explicit directory
+     * 3. /b - normal file inside root
+     * 4. /c - marker file for explicit directory
+     * 5. /c.bak - marker file for explicit directory
+     * 6. /c.bak/file2 - normal file inside explicit directory
+     * 7. /c/file3 - normal file inside explicit directory
+     * 8. /d - implicit directory
+     * 9. /e - marker file for explicit directory
+     * 10. /e/file4 - normal file inside explicit directory
+     */
+    // Create Path 0
+    createAzCopyFolder(new Path("/A"));
+
+    // Create Path 1 and 2.
+    fs.create(new Path("/a/file1"));
+
+    // Create Path 3
+    fs.create(new Path("/b"));
+
+    // Create Path 4 and 7
+    fs.create(new Path("/c/file3"));
+
+    // Create Path 5 and 6
+    fs.create(new Path("/c.bak/file2"));
+
+    // Create Path 8
+    createAzCopyFolder(new Path("/d"));
+
+    // Create Path 9 and 10
+    fs.create(new Path("/e/file4"));
+
+    FileStatus[] fileStatuses = fs.listStatus(new Path(ROOT_PATH));
+
+    // Assert that client.listPath was called 11 times.
+    // This will assert server returned 11 entries in total.
+    Mockito.verify(client, Mockito.times(11))
+        .listPath(eq(ROOT_PATH), eq(false), eq(1), any(), any(), any());
+
+    // Assert that after duplicate removal, only 7 unique entries are returned.
+    Assertions.assertThat(fileStatuses.length)
+        .describedAs("List size is not expected").isEqualTo(7);
+
+    // Assert that for duplicates, entry corresponding to marker blob is 
returned.
+    assertImplicitDirectoryFileStatus(fileStatuses[0], fs.makeQualified(new 
Path("/A")));
+    assertExplicitDirectoryFileStatus(fileStatuses[1], fs.makeQualified(new 
Path("/a")));
+    assertFilePathFileStatus(fileStatuses[2], fs.makeQualified(new 
Path("/b")));
+    assertExplicitDirectoryFileStatus(fileStatuses[3], fs.makeQualified(new 
Path("/c")));
+    assertExplicitDirectoryFileStatus(fileStatuses[4], fs.makeQualified(new 
Path("/c.bak")));
+    assertImplicitDirectoryFileStatus(fileStatuses[5], fs.makeQualified(new 
Path("/d")));
+    assertExplicitDirectoryFileStatus(fileStatuses[6], fs.makeQualified(new 
Path("/e")));
+
+    // Assert that there are no duplicates in the returned file statuses.
+    for (int i = 0; i < fileStatuses.length; i++) {

Review Comment:
   could we use a set here instead of a double loop?
   something like- 
   Set<Path> uniquePaths = new HashSet<>();
   for (FileStatus fileStatus : fileStatuses) {
       Assertions.assertThat(uniquePaths.add(fileStatus.getPath()))
           .describedAs("Duplicate entries found")
           .isTrue();
   }



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Re: [PR] HADOOP-19543. [ABFS][FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across Iterations [hadoop]

Reply via email to