[ 
https://issues.apache.org/jira/browse/HBASE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925386#comment-13925386
 ] 

Colin Patrick McCabe commented on HBASE-10689:
----------------------------------------------

[~stack], there are multiple kinds of caching in HDFS.  The path-based caching 
added in HDFS-4949 caches at the file level, so you are right that it is not 
that useful for HBase.  The advisory caching API is a little different.  It 
allows the application to control how much readahead HDFS does and a little bit 
about how the page cache is used.

When HBase reads a 64kb chunk, currently HDFS will load a 4MB segment off of 
the disk.  The rest of that 4MB is thrown away unless HBase uses it.  HBase 
could avoid this issue by calling DFSInputStream#setReadahead(65536).  Unless 
HBase is doing something smart with the rest of that 4MB, it seems like this 
might be a good idea?

> Explore advisory caching for MR over snapshot scans
> ---------------------------------------------------
>
>                 Key: HBASE-10689
>                 URL: https://issues.apache.org/jira/browse/HBASE-10689
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce, Performance
>            Reporter: Nick Dimiduk
>
> Per 
> [comment|https://issues.apache.org/jira/browse/HBASE-10660?focusedCommentId=13921730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13921730]
>  on HBASE-10660, explore using the new HDFS advisory caching feature 
> introduced in HDFS-4817 for TableSnapshotInputFormat.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to