[
https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15563782#comment-15563782
]
Aaron Fabbri commented on HADOOP-13651:
---------------------------------------
Minor status update, since this JIRA has a long gestation period. I'm working
on this now. So far I have code for:
- New config values: {{fs.s3a.metadatastore.authoratitive}}, and
{{fs.s3a.metadatastore.impl}}.
- getFileStatus()
- listStatus()
- rename()
- delete()
- mkdirs()
- copyFromLocalFile()
- copyFile()
What remains for this jira:
- create(). Figuring out the OutputStream plumbing now
- More testing.
What I'd like to do as separate jiras (because I favor smaller code reviews).
- Delete tracking
- Retries (i.e. eventual consistency retry policy). Would love to see this in
isolation since it is non-trivial.
I'm inserting TODO comments as I go at key locations for those two items.
Interesting things about my approach so far:
I'm trying to minimize changes to {{S3AFileSystem}}
- diff stat so far: {quote}
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
| 116 ++++++++++++++++++++++++++++++------
{quote}
- I introduce a "metadatastore s3a helper/glue" glass S3Guard which is a
bunch of static helper functions, so far.
- I introduce {{NullMetadataStore}} which is a no-op metadata store. Goal
was to simplify S3AFileSystem changes (always call MetadataStore, don't care if
it is no-op), but I also like that it further clarifies {{MetadataStore}}
semantics. Turns out S3AFileSystem still sometimes wants to know if there is
no MetadataStore to avoid allocating stuff that isn't needed. Seems like ok
tradeoff but I'll let folks comment when I post v1 patch.
I'm trying to keep PathMetadata simple: Either you have a PathMetadata,
including S3AFileStatus, or you don't. There are some spots where it would
be convenient to just record "this path exists, but we don't have metadata
yet", (e.g. create() -> OutputStream.close() -> S3AFileSystem.writeFinished()..
at that point I don't have a FileStatus.), but that would complicate
S3AFileSystem logic. We'll see.
> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
> Key: HADOOP-13651
> URL: https://issues.apache.org/jira/browse/HADOOP-13651
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is
> configured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]