Anuj Modi created HADOOP-19543: ---------------------------------- Summary: ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint Listing Across Iterations Key: HADOOP-19543 URL: https://issues.apache.org/jira/browse/HADOOP-19543 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.1, 3.5.0 Reporter: Anuj Modi Assignee: Anuj Modi
On FNS-Blob, List Blobs API is known to return duplicate entries for the non-empty explicit directories. One entry corresponds to the directory itself and another entry corresponding to the marker blob that driver internally creates and maintains to mark that path as a directory. We already know about this behaviour and it was handled to remove such duplicate entries from the set of entries that were returned as part current list iterations. Due to possible partition split if such duplicate entries happen to be returned in separate iteration, there is no handling on this and caller might get back the result with duplicate entries as happening in this case. The logic to remove duplicate was designed before the realization of partition split came. This PR fixes this bug -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org