[ 
https://issues.apache.org/jira/browse/HADOOP-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15398645#comment-15398645
 ] 

Aaron Fabbri commented on HADOOP-13208:
---------------------------------------

I spent a good amount of time this evening going over patch #18.  Overall looks 
good, considering the previous JIRA included most of the test code.  Are there 
any unit tests still missing (I'm willing to help if so)?

I'm +1 (non-binding).

Some random comments / questions:

{code}
+
     } else {
       LOG.debug("Adding: rd (not a dir): {}", f);
+      result = new ArrayList<>(1);
       result.add(fileStatus);
{code}
How about just returning the singleton array here, to reduce garbage a bit?

{code}
+  private ListObjectsRequest createListObjectsRequest(String key,
+      String delimiter) {
+    ListObjectsRequest request = new ListObjectsRequest();
+    request.setBucketName(bucket);
+    request.setMaxKeys(maxKeys);
+    request.setPrefix(key);
+    if (delimiter != null) {
+      request.setDelimiter(delimiter);
+    }
+    return request;
+  }
{code}
Good factorization.

{code}
+  private final class FileStatusListingIterator
+      implements RemoteIterator<FileStatus> {
+
+    /** Source of objects. */
+    private final ObjectListingIterator source;
{code}

I feel like we need to break up S3AFileSystem.java some.  It is getting pretty
large.  We should take any easy opportunities to pull out classes into separate
files.  I'm fine with doing this as a follow-up JIRA.

{code}
+     */
+    private boolean requestNextBatch() throws IOException {
+      // look for more object listing batches being available
+      while (source.hasNext()) {
+        // if available, retrieve it and build the next status
+        if (buildNextStatusBatch(source.next())) {
{code}
Yeah, this logic is non-trivial.  This while loop is key.

{code}
-   */
-  static class AWSAccessKeys {
-    private String accessKey = null;
-    private String accessSecret = null;
{code}
Bonus deadc0de removal, ok.

{code}
+  @Override
+  protected Configuration createConfiguration() {
+    Configuration conf = super.createConfiguration();
+    S3ATestUtils.disableFilesystemCaching(conf);
{code}
What is this for?  I see it was introduced in HADOOP-13131, but don't see
any usage of the configuration flag it sets (fs.s3a.impl.disable.cache).


> S3A listFiles(recursive=true) to do a bulk listObjects instead of walking the 
> pseudo-tree of directories
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13208
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13208
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-13208-branch-2-001.patch, 
> HADOOP-13208-branch-2-007.patch, HADOOP-13208-branch-2-008.patch, 
> HADOOP-13208-branch-2-009.patch, HADOOP-13208-branch-2-010.patch, 
> HADOOP-13208-branch-2-011.patch, HADOOP-13208-branch-2-012.patch, 
> HADOOP-13208-branch-2-017.patch, HADOOP-13208-branch-2-018.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A major cost in split calculation against object stores turns out be listing 
> the directory tree itself. That's because against S3, it takes S3A two HEADs 
> and two lists to list the content of any directory path (2 HEADs + 1 list for 
> getFileStatus(); the next list to query the contents).
> Listing a directory could be improved slightly by combining the final two 
> listings. However, a listing of a directory tree will still be 
> O(directories). In contrast, a recursive {{listFiles()}} operation should be 
> implementable by a bulk listing of all descendant paths; one List operation 
> per thousand descendants. 
> As the result of this call is an iterator, the ongoing listing can be 
> implemented within the iterator itself



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to