anujmodi2021 opened a new pull request, #7632: URL: https://github.com/apache/hadoop/pull/7632
PR in trunk: https://github.com/apache/hadoop/pull/7614 Commit CP'd: https://github.com/apache/hadoop/commit/810c42f88cc63a8054edc5a16baeb9a90e3bd523 JIRA: https://issues.apache.org/jira/browse/HADOOP-19543 ### Description of PR On FNS-Blob, the List Blobs API is known to return duplicate entries for non-empty explicit directories. One entry corresponds to the directory itself, and another corresponds to the marker blob that the driver internally creates and maintains to mark that path as a directory. We already know about this behaviour, and it was handled to remove such duplicate entries from the set of entries that were returned as part of current list iterations. Due to a possible partition split, if such duplicate entries happen to be returned in separate iterations, there is no handling on this, and the caller might get back the result with duplicate entries, as happened in this case. The logic to remove duplicates was designed before the realization of the partition split. This PR fixes this bug ### How was this patch tested? A new test for the failing scenario was added and existing test suite was ran to validate changes across all combinations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org