[
https://issues.apache.org/jira/browse/HADOOP-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15616699#comment-15616699
]
Chris Nauroth commented on HADOOP-13651:
----------------------------------------
Hello [~fabbri]. Thank you for sharing your patch.
I have not yet reviewed everything, but first I would like to discuss the
management of {{MetadataStore}} as a singleton. This could be problematic for
a process that wants to access multiple {{S3AFileSystem}} instances backed by
different S3 buckets. A concrete example of this would be a DistCp task
copying data from one bucket to another.
I had been thinking there would be a 1:1 cardinality relationship between
{{S3AFileSystem}} instances and {{MetadataStore}} instances. An
{{S3AFileSystem}} instance accesses exactly one bucket, and likewise, a
{{DynamoDBMetadataStore}} instance would access exactly one DynamoDB table. (I
also see this relationship is carried through into the latest HADOOP-13449
patch from [~liuml07].)
I think this is overall the easiest implementation path that supports use of
multiple {{S3AFileSystem}} instances in the same process. I suppose the
{{MetadataStore}} implementations could be made flexible to handle paths from
multiple {{S3AFileSystem}} instances, but that seems to lead to more complexity
to manage mapping tables and multiple AWS SDK client instances within the
{{MetadataStore}} implementation.
If the goal is to guard against costly repeated initialization, then I think
the {{FileSystem}} cache already has us covered. {{S3AFileSystem}} instances
can get reused via the cache, and assuming the 1:1 relationship, the
corresponding {{MetadataStore}} would get reused too.
> S3Guard: S3AFileSystem Integration with MetadataStore
> -----------------------------------------------------
>
> Key: HADOOP-13651
> URL: https://issues.apache.org/jira/browse/HADOOP-13651
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
> Attachments: HADOOP-13651-HADOOP-13345.001.patch,
> HADOOP-13651-HADOOP-13345.002.patch, HADOOP-13651-HADOOP-13345.003.patch
>
>
> Modify S3AFileSystem et al. to optionally use a MetadataStore for metadata
> consistency and caching.
> Implementation should have minimal overhead when no MetadataStore is
> configured.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]