[ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518867 ]
stack commented on HADOOP-1644: ------------------------------- Let me try your suggestion Jim of not having compactions disable flushes. Another thing I'd like to try is that rather than flushing memory to a new file, instead flush by merging with an existant file. I'm thinking it will take the same amount of elapsed time but we'll have put off a full compaction by not producing an added file. Another element to consider is that compactions are the means by which HStoreFile references are cleaned up in a region (If references, then a region cannot be split) so compaction should be doing its best to clean up instances of reference files. > [hbase] Compactions should not block updates > -------------------------------------------- > > Key: HADOOP-1644 > URL: https://issues.apache.org/jira/browse/HADOOP-1644 > Project: Hadoop > Issue Type: Improvement > Components: contrib/hbase > Affects Versions: 0.15.0 > Reporter: stack > Assignee: stack > Fix For: 0.15.0 > > > Currently, compactions take a long time. During compaction, updates are > carried by the HRegions' memcache (+ backing HLog). memcache is unable to > flush to disk until compaction completes. > Under sustained, substantial -- rows that contain multiple columns one of > which is a web page -- updates by multiple concurrent clients (10 in this > case), a common hbase usage scenario, the memcache grows fast and often to > orders of magnitude in excess of the configured 'flush-to-disk' threshold. > This throws the whole system out of kilter. When memcache does get to run > after compaction completes -- assuming you have sufficent RAM and the region > server doesn't OOME -- then the resulting on-disk file will be way larger > than any other on-disk HStoreFile bringing on a region split ..... but the > resulting split will produce regions that themselves need to be immediately > split because each half is beyond the configured limit, and so on... > In another issue yet to be posted, tuning and some pointed memcache flushes > makes the above condition less extreme but until compaction durations come > close to the memcache flush threshold compactions will remain disruptive. > Its allowed that compactions may never be fast enough as per bigtable paper > (This is a 'wish' issue). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.