[
https://issues.apache.org/jira/browse/HBASE-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054156#comment-13054156
]
Jason Rutherglen commented on HBASE-4018:
-----------------------------------------
bq. fs cache will always be compressed
That's likely where the slowdown occurs. I agree the values should be
compressed, in many cases the CPU overhead dwarfs (or should) the extra RAM
consumption from uncompressing into heap space. Right now in HBase there's
effectively a page fault when a block isn't in the cache, eg it then loads from
disk or network and uncompress'es into RAM while [likely] also removing
existing pages/blocks. That seems likely to be problematic.
CPU should be cheaper than RAM especially for HBase which logically should be
IO bound. This is also true of search, eg compression of posting lists is
implemented using vint or PFOR, instead of laying all the ints out on disk.
Search then becomes CPU bound from the iteration of multiple posting lists.
HBase is iterating one effective "list" though the compression algorithm likely
consumes far greater CPU. Perhaps it's easily offset with a less intensive
comp algorithm.
bq. What if some user uses the node, runs a package manager to update things,
or uses scp to get things off the server? the fs cache is likely to get screwed.
The fs cache becoming invalid in the examples given would be few and far
between. More worrisome is the block/page fault issue that I'm assuming can
happen frequently at the moment. I guess one could always set the block cache
to be quite small, and make the block sizes on the small side as well.
Effectively shifting the problem back to the system IO cache.
I think we need to benchmark. Also running yet another process on an HBase
node sounds scary.
> Attach memcached as secondary block cache to regionserver
> ---------------------------------------------------------
>
> Key: HBASE-4018
> URL: https://issues.apache.org/jira/browse/HBASE-4018
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Li Pi
> Assignee: Li Pi
>
> Currently, block caches are limited by heap size, which is limited by garbage
> collection times in Java.
> We can get around this by using memcached w/JNI as a secondary block cache.
> This should be faster than the linux file system's caching, and allow us to
> very quickly gain access to a high quality slab allocated cache.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira