[
https://issues.apache.org/jira/browse/HBASE-70?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565944#action_12565944
]
Jim Kellerman commented on HBASE-70:
------------------------------------
Stack and I discussed the possibility of moving the memcache back to the region
level instead of the store level, because it would make accounting easier.
However this approach has some serious drawbacks:
- some of the methods that access both the memcache and the store files (such
as getKeys, etc), are more efficient when everything is at the store level.
- we experienced a great deal of pain moving the memcache to the store level
from the region level in the first place as it forced us to re-write a lot of
the scanner code.
- the reason for moving memcache from region to store level was because it was
greatly simplified and reduced contention. Before when the cache filled, we had
no idea how much of it belonged to which family
So what I would suggest, instead of moving memcache back up to the region
level, is to move the cache size management back up to the region level. Let
the region keep track of the total cache space in use, which store(s) have the
largest caches, etc. Contention is reduced to smaller data structures that
manage the accounting, instead of bigger structures like the caches themselves.
This way we achieve:
- Better control of the overall cache space in use
- Eliminate the need for radical modifications (moving the cache back to the
region level at this point would be much harder than when when it was moved to
the store level in the first place since so much more has been added)
Basically we are able to gain what we need (better memory management), with
less contention on the caches themselves, via a less risky (and radical) change.
> [hbase] memory management
> -------------------------
>
> Key: HBASE-70
> URL: https://issues.apache.org/jira/browse/HBASE-70
> Project: Hadoop HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: stack
>
> Each Store has a Memcache of edits that is flushed on a fixed period (It used
> to be flushed when it grew beyond a limit). A Region can be made up of N
> Stores. A regionserver has no upper bound on the number of regions that can
> be deployed to it currently. Add to this that per mapfile, we have read the
> index into memory. We're also talking about adding caching of blocks and
> cells.
> We need a means of keeping an account of memory usage adjusting cache sizes
> and flush rates (or sizes) dynamically -- using References where possible --
> to accomodate deployment of added regions. If memory is strained, we should
> reject regions proffered by the master with a resouce-constrained, or some
> such, message.
> The manual sizing we currently do ain't going to cut it for clusters of any
> decent size.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.