[ 
https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671957#action_12671957
 ] 

Jonathan Gray commented on HBASE-1192:
--------------------------------------

My proposal is to build upon the work being done in HBASE-1186 and HBASE-1188 
to create our own LRU-style Map specialized for the block cache.

A few points as to why I think we should move away from SoftReferences and 
manage everything ourselves:

- The defined loose constraints and observed non-uniform behavior of 
SoftReferences
- We're already "managing" heap usage for Memcache.  Using softrefs for block 
cache, we'll have something that's almost a black box and trying to use all 
available memory.  This could make the memcache flush out itself because the RS 
is under heap pressure.  We won't have much control over fairness between 
memcaches, indexes, and the block cache if using softrefs.  I propose we build 
something very similar to the MemcacheFlusher thread that would deal with 
fairness between the different elements of the RS that uses significant heap 
(memcaches, indexes, block cache, cell cache, in-memory families, blooms, 
etc...).  As with the new file format, there's going to be more parameters in 
hbase 0.20 in order to optimize for your use case.  Like the file format, we'll 
have to come up with reasonable defaults and write more documentation about the 
effects of the different settings.  Do we want to divide up the total available 
heap on startup between the different memory consumers, do we want to leave it 
wide open for memcaches/indexes/blocks until we're under heap pressure and then 
make a decision about how to flush or evict fairly?
- Ability to implement in-memory families as described in the bigtable paper 
very easily by adding priority into the eviction algorithm
- Full table scans can thrash the cache (for Streamy, we do this only for MR 
jobs not user-facing stuff).  With our own structure, we can use a modified LRU 
algorithm that is resistant to table scans (i'm a fan of ARC but there's 
license issues; it's fairly simple to implement this if you manually 
configure... ARC is cool because it self-tunes).

Those are my main points.  The primary reason to not go in this direction is 
simplicity.  However, I think what we've learned in the past couple releases 
from OOME hell, we must (and already are) be in the business of heap 
management.  Streamy guys have done the research and development to do memory 
management in java as best as it seems it can be done (based on other open 
source java caching apps), so I'm confident we can be correct, efficient, and 
accurate enough to prevent oome issues and get optimal performance.

Erik will post his findings from his work experimenting with softref behavior.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary 
> decision is whether to continue using SoftReferences or to build our own 
> structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to