[
https://issues.apache.org/jira/browse/HADOOP-16540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919497#comment-16919497
]
Steve Loughran commented on HADOOP-16540:
-----------------------------------------
* its (user, prefix, auth) not just prefix and auth, bear that in mind
* given your example use case of S3, I'd like to know a lot more about what you
are considering here and why
S3A FS instances are fairly expensive: thread and http pools, dynamo DB pools,
AWS transfer managers...you don't want to have >1 per bucket if you can avoid
it. It may be better to support some tuning within the store, as HADOOP-16396
did for s3guard authoritative mode.
That leaves different user credentials as the main justification, or similar
things like encryption keys to use on different paths. True? Or maybe seek
policies?
If so, it'll be fun trying to work out how to deal with operations which span
paths.
All work has to be against hadoop trunk; you'll also need to make sure that it
works with delegation tokens for job submit, including S3A DTs. That is non
trivial as it is another place which uses (token identifier + FS URI) as the
map. Only one DT per bucket is going to be collected or provided regardless of
how many are in the cache. So please, get familiar with that code before
starting to do things with fairly major implications.
> Pluggable Filesystem Caching Support in FileSystem Class
> --------------------------------------------------------
>
> Key: HADOOP-16540
> URL: https://issues.apache.org/jira/browse/HADOOP-16540
> Project: Hadoop Common
> Issue Type: New Feature
> Components: fs
> Affects Versions: 3.3.0
> Reporter: Arun Ravi M V
> Priority: Major
>
> Provide an option to use Custom Cache Class in FileSystem Class. Currently,
> the caching is enabled by default and uses the URI schema and authority value
> to determine whether to create a new FS instance for the given URI or to
> fetch an already existing one from the cache.
> In case of AWS S3 FS Impl, for an S3 path, the authority name will be bucket
> name, ie Filesystem object will be cached at the bucket level, but providing
> a custom caching logic can empower the user to cache it at some prefix level
> and provide more flexibility.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]