[ 
https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879059#comment-13879059
 ] 

Colin Patrick McCabe commented on HDFS-5810:
--------------------------------------------

This patch unifies {{FileInputStreamCache}} and {{ClientMmapManager}} into a 
single cache called {{ShortCircuitCache}}.

Along the way, I noticed that our current caches were being destroyed and 
re-created constantly when using {{FileContext}}.  Because {{FileContext}} 
destroys and re-creates a {{DFSClient}} with each operation it does, the 
{{DFSClient}} is not the right place for caches to live.  Instead, they need to 
have global scope, like {{PeerCache}} does currently.  I created 
{{ClientCacheContext}} for this purpose and moved {{ShortCircuitCache}}, 
{{PeerCache}}, and {{DomainSocketFactory}} into it.

I wanted to give threads the choice to use a different cache if they so 
desired.  So I created the {{dfs.client.cache.context}}.  When looking up the 
relevant {{ClientCacheContext}}, we look up the value of this key in a global 
map.  This allows clients to (for example) create separate {{FileSystem}} or 
{{FileContext}} instances that don't share a socket cache, by setting different 
values for this configuration key at the time of creation.  It is also handy 
for unit tests, to avoid cross-test contamination.

With this change, two {{DFSInputStream}} instances reading the same local 
replica via short-circuit reads now use the same file descriptor.  This has an 
obvious advantage in keeping down the number of open file descriptors we have.  
It also has some more subtle advantages.  For example, we no longer re-read the 
first part of the block metadata header over and over, since we only have one 
ShortCircuitReplica for that block in the cache.  We only make one 
{{REQUEST_SHORT_CIRCUIT_FDS}} RPC to the DataNode, rather than two.

{{BlockReaderFactory}} was previously a place for static methods to hang out.  
Now it's a "real class" that takes care of the work of building a block reader. 
 It's nice to have better encapsulation of this functionality.  
{{BlockReaderFactory#build}} only throws an IOException if there was a security 
problem that requires that the stream do something (like refetch block tokens), 
or if no block reader at all could be created.

> Unify mmap cache and short-circuit file descriptor cache
> --------------------------------------------------------
>
>                 Key: HDFS-5810
>                 URL: https://issues.apache.org/jira/browse/HDFS-5810
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 2.4.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5810.001.patch
>
>
> We should unify the client mmap cache and the client file descriptor cache.  
> Since mmaps are granted corresponding to file descriptors in the cache 
> (currently FileInputStreamCache), they have to be tracked together to do 
> "smarter" things like HDFS-5182.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to