[
https://issues.apache.org/jira/browse/HDFS-7693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329585#comment-14329585
]
Colin Patrick McCabe commented on HDFS-7693:
--------------------------------------------
Thanks, Chris. Those are good questions. The motivation is to reduce NN RPC
traffic in a distributed database. Partly the idea is if multiple queries use
the same streams, we'd like to just keep them open rather than re-opening (and
generating another getBlockLocations call on the NN).
I did consider putting caching into hdfsOpen / hdfsClose, but I feel they're
better off separate. Partly the nightmare of the FileSystem cache convinced me
that "hidden" caches are usually bad. Partly it's just that I'd like to be
able to evolve the cache in the future independently of hdfsOpen (like adding
new eviction policies, etc.).
I'm going to close this out since we decided to do the caching in an upper
(application) layer rather than at the libhdfs level. It makes more sense
there since the application (Impala) already has some APIs for cache
invalidation... something we don't really know about at the libhdfs level. The
cache is also somewhat logically separate from the rest of the libhdfs stuff,
so it would be nice to have separation of concerns.
> libhdfs: add hdfsFile cache
> ---------------------------
>
> Key: HDFS-7693
> URL: https://issues.apache.org/jira/browse/HDFS-7693
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: libhdfs
> Affects Versions: 2.7.0
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Attachments: HDFS-7693.001.patch
>
>
> Add an hdfsFile cache inside libhdfs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)