Anuj Modi created HADOOP-19543:
----------------------------------
Summary: ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint
Listing Across Iterations
Key: HADOOP-19543
URL: https://issues.apache.org/jira/browse/HADOOP-19543
Project: Hadoop Common
Issue Type: Sub-task
Components: fs/azure
Affects Versions: 3.4.1, 3.5.0
Reporter: Anuj Modi
Assignee: Anuj Modi
On FNS-Blob, List Blobs API is known to return duplicate entries for the
non-empty explicit directories. One entry corresponds to the directory itself
and another entry corresponding to the marker blob that driver internally
creates and maintains to mark that path as a directory. We already know about
this behaviour and it was handled to remove such duplicate entries from the set
of entries that were returned as part current list iterations.
Due to possible partition split if such duplicate entries happen to be returned
in separate iteration, there is no handling on this and caller might get back
the result with duplicate entries as happening in this case. The logic to
remove duplicate was designed before the realization of partition split came.
This PR fixes this bug
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]