[ https://issues.apache.org/jira/browse/HADOOP-19543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944735#comment-17944735 ]
ASF GitHub Bot commented on HADOOP-19543: ----------------------------------------- anujmodi2021 commented on code in PR #7614: URL: https://github.com/apache/hadoop/pull/7614#discussion_r2044730134 ########## hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystemStore.java: ########## @@ -1299,6 +1311,29 @@ public String listStatus(final Path path, final String startFrom, return continuation; } + private void filterDuplicateEntriesForBlobClient( Review Comment: Added > ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across > Iterations > ---------------------------------------------------------------------------------- > > Key: HADOOP-19543 > URL: https://issues.apache.org/jira/browse/HADOOP-19543 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure > Affects Versions: 3.5.0, 3.4.1 > Reporter: Anuj Modi > Assignee: Anuj Modi > Priority: Critical > Labels: pull-request-available > > On FNS-Blob, List Blobs API is known to return duplicate entries for the > non-empty explicit directories. One entry corresponds to the directory itself > and another entry corresponding to the marker blob that driver internally > creates and maintains to mark that path as a directory. We already know about > this behaviour and it was handled to remove such duplicate entries from the > set of entries that were returned as part current list iterations. > Due to possible partition split if such duplicate entries happen to be > returned in separate iteration, there is no handling on this and caller might > get back the result with duplicate entries as happening in this case. The > logic to remove duplicate was designed before the realization of partition > split came. > This PR fixes this bug -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org