Aaron Fabbri updated HADOOP-13651:
    Attachment: HADOOP-13651-HADOOP-13345.001.patch

I don't have all tests passing yet, but I wanted to attach a v1 / RFC patch in 
case folks want to take a look.  See my previous comment for overview, (except 
I've now implemented create() in this patch).

This patch has really benefited from the great work on integration and FS 
contract tests that folks has done, so thank you.

The create() case was interesting:  On create, we need to put a FileStatus in 
the MetadataStore.  The main wart was on modification time:  S3A uses S3's 
server-side modification time to populate FileStatus's.  We cannot know that 
time value at create time, unless we blocked and polled S3 for results.  Those 
results would be subject to S3 consistency and multi-writer issues.  The other 
approach would be to put a PathMetadata in the MetadataStore that says "this 
file exists but we do not have FileStatus for it yet".. That complicates the 
client a bit, so for now, I just use local system time for modification time.
The main issue I'm tackling next is {{S3AFileStatus#isEmptyDirectory()}}.. This 
one bit of state is a pain because it means you cannot simply cache a 
S3AFileStatus in isolation: it needs to be updated when the set of children 
changes.  Couple this with the fact that we do not require all metadata to be 
pre-loaded into the MetadataStore, and you have a nasty little problem.  I have 
an idea of how to tackle it.  I may post my solution to that part as a separate 
RFC patch on here so folks can comment on that part alone.

> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>                 Key: HADOOP-13651
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13651
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>         Attachments: HADOOP-13651-HADOOP-13345.001.patch
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata 
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is 
> configured.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to