[ 
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054156#comment-13054156
 ] 

Jason Rutherglen commented on HBASE-4018:
-----------------------------------------

bq. fs cache will always be compressed

That's likely where the slowdown occurs.  I agree the values should be 
compressed, in many cases the CPU overhead dwarfs (or should) the extra RAM 
consumption from uncompressing into heap space.  Right now in HBase there's 
effectively a page fault when a block isn't in the cache, eg it then loads from 
disk or network and uncompress'es into RAM while [likely] also removing 
existing pages/blocks.  That seems likely to be problematic.

CPU should be cheaper than RAM especially for HBase which logically should be 
IO bound.  This is also true of search, eg compression of posting lists is 
implemented using vint or PFOR, instead of laying all the ints out on disk.  
Search then becomes CPU bound from the iteration of multiple posting lists.  
HBase is iterating one effective "list" though the compression algorithm likely 
consumes far greater CPU.  Perhaps it's easily offset with a less intensive 
comp algorithm.

bq. What if some user uses the node, runs a package manager to update things, 
or uses scp to get things off the server? the fs cache is likely to get screwed.

The fs cache becoming invalid in the examples given would be few and far 
between.  More worrisome is the block/page fault issue that I'm assuming can 
happen frequently at the moment.  I guess one could always set the block cache 
to be quite small, and make the block sizes on the small side as well.  
Effectively shifting the problem back to the system IO cache.

I think we need to benchmark.  Also running yet another process on an HBase 
node sounds scary.  

> Attach memcached as secondary block cache to regionserver
> ---------------------------------------------------------
>
>                 Key: HBASE-4018
>                 URL: https://issues.apache.org/jira/browse/HBASE-4018
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Li Pi
>            Assignee: Li Pi
>
> Currently, block caches are limited by heap size, which is limited by garbage 
> collection times in Java.
> We can get around this by using memcached w/JNI as a secondary block cache. 
> This should be faster than the linux file system's caching, and allow us to 
> very quickly gain access to a high quality slab allocated cache.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to