[ 
https://issues.apache.org/jira/browse/HADOOP-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969765#comment-15969765
 ] 

Aaron Fabbri commented on HADOOP-14266:
---------------------------------------

Thanks for your patience.  The patch looks good to me.  I am working through 
the test code now and running all the integration tests.  Thank you for the 
extra list consistency test cases.

I took notes below, mostly for my understanding:

FileStatusListingIterator
- Now accepts an optional "providedStatus" iterator for entries from a 
MetadataStore (previous patch added this, but it was an array).  
{{FileStatusListingIterator}} will produce a union of the set of paths provided 
by the underlying S3 iterator, and the set (if any) supplied via providedStatus 
iterator.

listFiles()
- Before creating the S3 listing iterator, create a 
{{cachedFileStatusIterator}} which enumerates the results from the 
{{MetadataStore}}.  In the recursive case, we use {{DescendantsIterator}} which 
knows how to recursively enumerate a directory tree in MetadataStore.  For 
non-recursive case, we simply wrap the MetadataStore's listing in a 
{{ProvidedFileStatusIterator}}.
- When creating the {{FileStatusListingIterator}} we can now pass in the 
MetadataStore's iterator.  {{listFiles()}} returns the 
{{FileStatusListingIterator}} which returns the union of the S3 and 
MetadataStore results.
- We do not try to optimize the {{recursive=true}} case yet.  I agree with 
this.  I can imagine a scheme where we only demand-fetch S3 subtree listings 
when we hit a subtree that is missing in MetadataStore or has  
{{isAuthoritative == false}}, but that is complex and could actually perform 
worse in limited corner cases.  Sounds like future enhancement to me.

listLocatedStatus()
- Same logic, just refactored a bit, and uses the new Iterator (instead of 
array) for MetadataStore results.
- This one (previous patch) *does* optimize the {{isAuthoritative}} case:  if 
the MetadataStore claims the listing is complete, and the S3A client is 
configured to allow it ({{fs.s3a.metadatastore.authoritative=true}}), it will 
skip the query to s3.




> S3Guard: S3AFileSystem::listFiles() to employ MetadataStore
> -----------------------------------------------------------
>
>                 Key: HADOOP-14266
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14266
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: HADOOP-13345
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-14266-HADOOP-13345.000.patch, 
> HADOOP-14266-HADOOP-13345.001.patch, HADOOP-14266-HADOOP-13345.002.patch, 
> HADOOP-14266-HADOOP-13345.003.patch, HADOOP-14266-HADOOP-13345.003.patch, 
> HADOOP-14266-HADOOP-13345.004.patch, HADOOP-14266-HADOOP-13345-005.patch, 
> HADOOP-14266-HADOOP-13345.005.patch, HADOOP-14266-HADOOP-13345.006.patch
>
>
> Similar to [HADOOP-13926], this is to track the effort of employing 
> MetadataStore in {{S3AFileSystem::listFiles()}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to