[
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879059#comment-13879059
]
Colin Patrick McCabe commented on HDFS-5810:
--------------------------------------------
This patch unifies {{FileInputStreamCache}} and {{ClientMmapManager}} into a
single cache called {{ShortCircuitCache}}.
Along the way, I noticed that our current caches were being destroyed and
re-created constantly when using {{FileContext}}. Because {{FileContext}}
destroys and re-creates a {{DFSClient}} with each operation it does, the
{{DFSClient}} is not the right place for caches to live. Instead, they need to
have global scope, like {{PeerCache}} does currently. I created
{{ClientCacheContext}} for this purpose and moved {{ShortCircuitCache}},
{{PeerCache}}, and {{DomainSocketFactory}} into it.
I wanted to give threads the choice to use a different cache if they so
desired. So I created the {{dfs.client.cache.context}}. When looking up the
relevant {{ClientCacheContext}}, we look up the value of this key in a global
map. This allows clients to (for example) create separate {{FileSystem}} or
{{FileContext}} instances that don't share a socket cache, by setting different
values for this configuration key at the time of creation. It is also handy
for unit tests, to avoid cross-test contamination.
With this change, two {{DFSInputStream}} instances reading the same local
replica via short-circuit reads now use the same file descriptor. This has an
obvious advantage in keeping down the number of open file descriptors we have.
It also has some more subtle advantages. For example, we no longer re-read the
first part of the block metadata header over and over, since we only have one
ShortCircuitReplica for that block in the cache. We only make one
{{REQUEST_SHORT_CIRCUIT_FDS}} RPC to the DataNode, rather than two.
{{BlockReaderFactory}} was previously a place for static methods to hang out.
Now it's a "real class" that takes care of the work of building a block reader.
It's nice to have better encapsulation of this functionality.
{{BlockReaderFactory#build}} only throws an IOException if there was a security
problem that requires that the stream do something (like refetch block tokens),
or if no block reader at all could be created.
> Unify mmap cache and short-circuit file descriptor cache
> --------------------------------------------------------
>
> Key: HDFS-5810
> URL: https://issues.apache.org/jira/browse/HDFS-5810
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: hdfs-client
> Affects Versions: 2.4.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-5810.001.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.
> Since mmaps are granted corresponding to file descriptors in the cache
> (currently FileInputStreamCache), they have to be tracked together to do
> "smarter" things like HDFS-5182.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)