[
https://issues.apache.org/jira/browse/HBASE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925386#comment-13925386
]
Colin Patrick McCabe commented on HBASE-10689:
----------------------------------------------
[~stack], there are multiple kinds of caching in HDFS. The path-based caching
added in HDFS-4949 caches at the file level, so you are right that it is not
that useful for HBase. The advisory caching API is a little different. It
allows the application to control how much readahead HDFS does and a little bit
about how the page cache is used.
When HBase reads a 64kb chunk, currently HDFS will load a 4MB segment off of
the disk. The rest of that 4MB is thrown away unless HBase uses it. HBase
could avoid this issue by calling DFSInputStream#setReadahead(65536). Unless
HBase is doing something smart with the rest of that 4MB, it seems like this
might be a good idea?
> Explore advisory caching for MR over snapshot scans
> ---------------------------------------------------
>
> Key: HBASE-10689
> URL: https://issues.apache.org/jira/browse/HBASE-10689
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce, Performance
> Reporter: Nick Dimiduk
>
> Per
> [comment|https://issues.apache.org/jira/browse/HBASE-10660?focusedCommentId=13921730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13921730]
> on HBASE-10660, explore using the new HDFS advisory caching feature
> introduced in HDFS-4817 for TableSnapshotInputFormat.
--
This message was sent by Atlassian JIRA
(v6.2#6252)