[ 
https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518867
 ] 

stack commented on HADOOP-1644:
-------------------------------

Let me try your suggestion Jim of not having compactions disable flushes. 

Another thing I'd like to try is that rather than flushing memory to a new 
file, instead flush by merging with an existant file.  I'm thinking it will 
take the same amount of elapsed time but we'll have put off a full compaction 
by not producing an added file.

Another element to consider is that compactions are the means by which 
HStoreFile references are cleaned up in a region (If references, then a region 
cannot be split) so compaction should be doing its best to clean up instances 
of reference files.



> [hbase] Compactions should not block updates
> --------------------------------------------
>
>                 Key: HADOOP-1644
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1644
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.15.0
>
>
> Currently, compactions take a long time.  During compaction, updates are 
> carried by the HRegions' memcache (+ backing HLog).  memcache is unable to 
> flush to disk until compaction completes.
> Under sustained, substantial --  rows that contain multiple columns one of 
> which is a web page -- updates by multiple concurrent clients (10 in this 
> case), a common hbase usage scenario, the memcache grows fast and often to 
> orders of magnitude in excess of the configured 'flush-to-disk' threshold.
> This throws the whole system out of kilter.  When memcache does get to run 
> after compaction completes -- assuming you have sufficent RAM and the region 
> server doesn't OOME -- then the resulting on-disk file will be way larger 
> than any other on-disk HStoreFile bringing on a region split ..... but the 
> resulting split will produce regions that themselves need to be immediately 
> split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes 
> makes the above condition less extreme but until compaction durations come 
> close to the memcache flush threshold compactions will remain disruptive. 
> Its allowed that compactions may never be fast enough as per bigtable paper 
> (This is a 'wish' issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to