anujmodi2021 opened a new pull request, #7632:
URL: https://github.com/apache/hadoop/pull/7632

   PR in trunk: https://github.com/apache/hadoop/pull/7614
   Commit CP'd: 
https://github.com/apache/hadoop/commit/810c42f88cc63a8054edc5a16baeb9a90e3bd523
   JIRA: https://issues.apache.org/jira/browse/HADOOP-19543
   
   ### Description of PR
   On FNS-Blob, the List Blobs API is known to return duplicate entries for 
non-empty explicit directories. One entry corresponds to the directory itself, 
and another corresponds to the marker blob that the driver internally creates 
and maintains to mark that path as a directory. We already know about this 
behaviour, and it was handled to remove such duplicate entries from the set of 
entries that were returned as part of current list iterations.
   
   Due to a possible partition split, if such duplicate entries happen to be 
returned in separate iterations, there is no handling on this, and the caller 
might get back the result with duplicate entries, as happened in this case. The 
logic to remove duplicates was designed before the realization of the partition 
split.
   
   This PR fixes this bug
   
   ### How was this patch tested?
   A new test for the failing scenario was added and existing test suite was 
ran to validate changes across all combinations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to