[jira] [Commented] (HBASE-14383) Compaction improvements

Enis Soztutar (JIRA) Thu, 17 Sep 2015 16:04:18 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804682#comment-14804682
 ]


Enis Soztutar commented on HBASE-14383:
---------------------------------------

bq. flush policy ignores all files less than 15MB.
Where is this code? I could not find anything in the periodic or non-periodic 
flush requests that prevents flush requests. 
bq. maxlogs is really a function of heap available for the memstores and the 
HDFS block size used. Something like: maxlogs = memstore heap / (HDFS blocksize 
* 0.95)
This assumes that all memstores are getting updates. In case a memstore stops 
getting updates, it will not flush for ~0.5 hour (expected) unless it is the 
biggest memstore left. 
bq. Can we just default it to that? Maybe with 10% padding.
Maybe we can instead do the limit as 2x or 3x. 

> Compaction improvements
> -----------------------
>
>                 Key: HBASE-14383
>                 URL: https://issues.apache.org/jira/browse/HBASE-14383
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>
> Still major issue in many production environments. The general recommendation 
> - disabling region splitting and major compactions to reduce unpredictable 
> IO/CPU spikes, especially during peak times and running them manually during 
> off peak times. Still do not resolve the issues completely.
> h3. Flush storms
> * rolling WAL events across cluster can be highly correlated, hence flushing 
> memstores, hence triggering minor compactions, that can be promoted to major 
> ones. These events are highly correlated in time if there is a balanced 
> write-load on the regions in a table.
> *  the same is true for memstore flushing due to periodic memstore flusher 
> operation. 
> Both above may produce *flush storms* which are as bad as *compaction 
> storms*. 
> What can be done here. We can spread these events over time by randomizing 
> (with jitter) several  config options:
> # hbase.regionserver.optionalcacheflushinterval
> # hbase.regionserver.flush.per.changes
> # hbase.regionserver.maxlogs   
> h3. ExploringCompactionPolicy max compaction size
> One more optimization can be added to ExploringCompactionPolicy. To limit 
> size of a compaction there is a config parameter one could use 
> hbase.hstore.compaction.max.size. It would be nice to have two separate 
> limits: for peak and off peak hours.
> h3. ExploringCompactionPolicy selection evaluation algorithm
> Too simple? Selection with more files always wins, selection of smaller size 
> wins if number of files is the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14383) Compaction improvements

Reply via email to