[
https://issues.apache.org/jira/browse/HADOOP-14266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969765#comment-15969765
]
Aaron Fabbri commented on HADOOP-14266:
---------------------------------------
Thanks for your patience. The patch looks good to me. I am working through
the test code now and running all the integration tests. Thank you for the
extra list consistency test cases.
I took notes below, mostly for my understanding:
FileStatusListingIterator
- Now accepts an optional "providedStatus" iterator for entries from a
MetadataStore (previous patch added this, but it was an array).
{{FileStatusListingIterator}} will produce a union of the set of paths provided
by the underlying S3 iterator, and the set (if any) supplied via providedStatus
iterator.
listFiles()
- Before creating the S3 listing iterator, create a
{{cachedFileStatusIterator}} which enumerates the results from the
{{MetadataStore}}. In the recursive case, we use {{DescendantsIterator}} which
knows how to recursively enumerate a directory tree in MetadataStore. For
non-recursive case, we simply wrap the MetadataStore's listing in a
{{ProvidedFileStatusIterator}}.
- When creating the {{FileStatusListingIterator}} we can now pass in the
MetadataStore's iterator. {{listFiles()}} returns the
{{FileStatusListingIterator}} which returns the union of the S3 and
MetadataStore results.
- We do not try to optimize the {{recursive=true}} case yet. I agree with
this. I can imagine a scheme where we only demand-fetch S3 subtree listings
when we hit a subtree that is missing in MetadataStore or has
{{isAuthoritative == false}}, but that is complex and could actually perform
worse in limited corner cases. Sounds like future enhancement to me.
listLocatedStatus()
- Same logic, just refactored a bit, and uses the new Iterator (instead of
array) for MetadataStore results.
- This one (previous patch) *does* optimize the {{isAuthoritative}} case: if
the MetadataStore claims the listing is complete, and the S3A client is
configured to allow it ({{fs.s3a.metadatastore.authoritative=true}}), it will
skip the query to s3.
> S3Guard: S3AFileSystem::listFiles() to employ MetadataStore
> -----------------------------------------------------------
>
> Key: HADOOP-14266
> URL: https://issues.apache.org/jira/browse/HADOOP-14266
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: HADOOP-13345
> Reporter: Mingliang Liu
> Assignee: Mingliang Liu
> Attachments: HADOOP-14266-HADOOP-13345.000.patch,
> HADOOP-14266-HADOOP-13345.001.patch, HADOOP-14266-HADOOP-13345.002.patch,
> HADOOP-14266-HADOOP-13345.003.patch, HADOOP-14266-HADOOP-13345.003.patch,
> HADOOP-14266-HADOOP-13345.004.patch, HADOOP-14266-HADOOP-13345-005.patch,
> HADOOP-14266-HADOOP-13345.005.patch, HADOOP-14266-HADOOP-13345.006.patch
>
>
> Similar to [HADOOP-13926], this is to track the effort of employing
> MetadataStore in {{S3AFileSystem::listFiles()}}.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]