[
https://issues.apache.org/jira/browse/HADOOP-19239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17869665#comment-17869665
]
Steve Loughran commented on HADOOP-19239:
-----------------------------------------
+if expiry is the key goal, maybe this is time to move to caffeine first
-provided each entry can explicitly declare their own expiry
Please look at all the code where we generate new entries and need to handle
slow FS initialize() calls due to network IO -we don't want to make that worse
> Enhance FileSystem.Cache to honor security token and expiration
> ---------------------------------------------------------------
>
> Key: HADOOP-19239
> URL: https://issues.apache.org/jira/browse/HADOOP-19239
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs, fs/s3
> Affects Versions: 3.3.4
> Reporter: Xiang Li
> Assignee: Xiang Li
> Priority: Major
>
> We have an online service which uses Hadoop FileSystem to load files from
> Clould storage.
> The current cache in FileSystem is a
> [HashMap|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3635C1-L3635C62],
> and its key honors scheme, authority (like
> [user@host:port|https://en.wikipedia.org/wiki/Uniform_Resource_Identifier#Syntax]),
> ugi and a unique long for its [hash
> code|https://github.com/apache/hadoop/blob/4525c7e35ea22d7a6350b8af10eb8d2ff68376e7/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L3891C1-L3894C8].
> And among those 4 fields, only "scheme" and "authority" could be controlled
> externally.
> That results in a wrong case like: A FileSystem entry in the cache was
> created with schemeA + authorityA, and with read + write access, and an
> expiration. Later, an API to get FileSystem comes still using schemeA +
> authorityA, but with less access (maybe read only), or it already expires,
> that FileSystem entry in the cache is honored by mistake, while no new
> FilleSystem is created. It does not lead to a security issue, but subsequent
> calls (may to read the file) will be rejected with 403 by the remote stoage.
> Our proposal is like
> * Short term
> ** Add a new field in FileSystem.Cache.Key to affect hashCode() and
> equals(). This field could be specified when contructing a Key.
> ** Add a simple expiration mechanism in FileSystem.Cache
> *** Each cache entry is created with a expiration
> *** When getting a FileSystem, if the cache entry is hit but already
> expires, close it and remove it from the cache. And return a new created
> FileSystem.
> * Long term
> ** Replace the internal HashMap by a more modern and full functional cache
> framework, like [https://github.com/ben-manes/caffeine]
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]