[hbase] Compactions should take no longer than period between memcache flushes
------------------------------------------------------------------------------
Key: HADOOP-1644
URL: https://issues.apache.org/jira/browse/HADOOP-1644
Project: Hadoop
Issue Type: Wish
Components: contrib/hbase
Reporter: stack
Assignee: stack
Priority: Minor
Currently, compactions take a long time. During compaction, updates are
carried by the HRegions' memcache (+ backing HLog). memcache is unable to
flush to disk until compaction completes.
Under sustained, substantial -- rows that contain multiple columns one of
which is a web page -- updates by multiple concurrent clients (10 in this
case), a common hbase usage scenario, the memcache grows fast and often to
orders of magnitude in excess of the configured 'flush-to-disk' threshold.
This throws the whole system out of kilter. When memcache does get to run
after compaction completes -- assuming you have sufficent RAM and the region
server doesn't OOME -- then the resulting on-disk file will be way larger than
any other on-disk HStoreFile bringing on a region split ..... but the resulting
split will produce regions that themselves need to be immediately split because
each half is beyond the configured limit, and so on...
In another issue yet to be posted, tuning and some pointed memcache flushes
makes the above condition less extreme but until compaction durations come
close to the memcache flush threshold compactions will remain disruptive.
Its allowed that compactions may never be fast enough as per bigtable paper
(This is a 'wish' issue).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.