ahmarsuhail opened a new pull request #3978:
URL: https://github.com/apache/hadoop/pull/3978


   ### Description of PR
   JIRA: https://issues.apache.org/jira/browse/HADOOP-13704
   
   This PR implements an optimised version of getContentSummary which uses the 
result from the listFiles iterator.
   
   Explanation of new `buildDirectorySet` method added:
   
   Since the listFiles operation can return the directory `a/b/c` as a single 
object, we need to recurse over the path `a/b/c` to ensure we have counted all 
directories. We do this by keeping two sets, dirSet (Set of all directories 
under the base path) and pathTraversed (Set of paths we have recursed over so 
far).
   
   Iterating over directory structure `basePath/a/b/c`, `basePath/a/b/d`, we 
will first find all the directories in `basePath/a/b/c`. Once this is 
completed, the pathTraversed set will have `{basePath/a/b}` and dirSet will 
have `{basePath/a, basePath/a/b, basePath/a/b/c}`.
   
   Then for `basePath/a/b/d`, just add `basePath/a/b/d` to the dirSet and don't 
do any additional work as path `basePath/a/b` has already been traversed.
   
   The Jira ticket mentions that we should add in some instrumentation to 
measure usage. T's already code that does this 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L3256)and
 usage is tested in an integration test 
[here](https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/performance/ITestS3AMiscOperationCost.java#L144)
 .
   
   ### How was this patch tested?
   
   Tested in eu-west-1 by running
   
   `mvn -Dparallel-tests -DtestsThreadCount=16 clean verify`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to