Anuj Modi created HADOOP-19543:
----------------------------------

             Summary: ABFS: [FnsOverBlob] Remove Duplicates from Blob Endpoint 
Listing Across Iterations
                 Key: HADOOP-19543
                 URL: https://issues.apache.org/jira/browse/HADOOP-19543
             Project: Hadoop Common
          Issue Type: Sub-task
          Components: fs/azure
    Affects Versions: 3.4.1, 3.5.0
            Reporter: Anuj Modi
            Assignee: Anuj Modi


On FNS-Blob, List Blobs API is known to return duplicate entries for the 
non-empty explicit directories. One entry corresponds to the directory itself 
and another entry corresponding to the marker blob that driver internally 
creates and maintains to mark that path as a directory. We already know about 
this behaviour and it was handled to remove such duplicate entries from the set 
of entries that were returned as part current list iterations.

Due to possible partition split if such duplicate entries happen to be returned 
in separate iteration, there is no handling on this and caller might get back 
the result with duplicate entries as happening in this case. The logic to 
remove duplicate was designed before the realization of partition split came.

This PR fixes this bug



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to