[ https://issues.apache.org/jira/browse/HADOOP-19543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944736#comment-17944736 ]
ASF GitHub Bot commented on HADOOP-19543: ----------------------------------------- anujmodi2021 commented on code in PR #7614: URL: https://github.com/apache/hadoop/pull/7614#discussion_r2044731084 ########## hadoop-tools/hadoop-azure/src/test/java/org/apache/hadoop/fs/azurebfs/ITestAzureBlobFileSystemListStatus.java: ########## @@ -532,6 +532,28 @@ public void testEmptyContinuationToken() throws Exception { .describedAs("Listing Size Not as expected").hasSize(1); } + @Test + public void testDuplicateEntriesAcrossListBlobIterations() throws Exception { Review Comment: Modified test as above > ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across > Iterations > ---------------------------------------------------------------------------------- > > Key: HADOOP-19543 > URL: https://issues.apache.org/jira/browse/HADOOP-19543 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.5.0, 3.4.1 > Reporter: Anuj Modi > Assignee: Anuj Modi > Priority: Critical > Labels: pull-request-available > > On FNS-Blob, List Blobs API is known to return duplicate entries for the > non-empty explicit directories. One entry corresponds to the directory itself > and another entry corresponding to the marker blob that driver internally > creates and maintains to mark that path as a directory. We already know about > this behaviour and it was handled to remove such duplicate entries from the > set of entries that were returned as part current list iterations. > Due to possible partition split if such duplicate entries happen to be > returned in separate iteration, there is no handling on this and caller might > get back the result with duplicate entries as happening in this case. The > logic to remove duplicate was designed before the realization of partition > split came. > This PR fixes this bug -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org