[
https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671957#action_12671957
]
Jonathan Gray commented on HBASE-1192:
--------------------------------------
My proposal is to build upon the work being done in HBASE-1186 and HBASE-1188
to create our own LRU-style Map specialized for the block cache.
A few points as to why I think we should move away from SoftReferences and
manage everything ourselves:
- The defined loose constraints and observed non-uniform behavior of
SoftReferences
- We're already "managing" heap usage for Memcache. Using softrefs for block
cache, we'll have something that's almost a black box and trying to use all
available memory. This could make the memcache flush out itself because the RS
is under heap pressure. We won't have much control over fairness between
memcaches, indexes, and the block cache if using softrefs. I propose we build
something very similar to the MemcacheFlusher thread that would deal with
fairness between the different elements of the RS that uses significant heap
(memcaches, indexes, block cache, cell cache, in-memory families, blooms,
etc...). As with the new file format, there's going to be more parameters in
hbase 0.20 in order to optimize for your use case. Like the file format, we'll
have to come up with reasonable defaults and write more documentation about the
effects of the different settings. Do we want to divide up the total available
heap on startup between the different memory consumers, do we want to leave it
wide open for memcaches/indexes/blocks until we're under heap pressure and then
make a decision about how to flush or evict fairly?
- Ability to implement in-memory families as described in the bigtable paper
very easily by adding priority into the eviction algorithm
- Full table scans can thrash the cache (for Streamy, we do this only for MR
jobs not user-facing stuff). With our own structure, we can use a modified LRU
algorithm that is resistant to table scans (i'm a fan of ARC but there's
license issues; it's fairly simple to implement this if you manually
configure... ARC is cool because it self-tunes).
Those are my main points. The primary reason to not go in this direction is
simplicity. However, I think what we've learned in the past couple releases
from OOME hell, we must (and already are) be in the business of heap
management. Streamy guys have done the research and development to do memory
management in java as best as it seems it can be done (based on other open
source java caching apps), so I'm confident we can be correct, efficient, and
accurate enough to prevent oome issues and get optimal performance.
Erik will post his findings from his work experimenting with softref behavior.
> LRU-style map for the block cache
> ---------------------------------
>
> Key: HBASE-1192
> URL: https://issues.apache.org/jira/browse/HBASE-1192
> Project: Hadoop HBase
> Issue Type: New Feature
> Components: regionserver
> Reporter: Jonathan Gray
> Priority: Blocker
> Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache. The primary
> decision is whether to continue using SoftReferences or to build our own
> structure.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.