[ 
https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746783#comment-14746783
 ] 

Vladimir Rodionov commented on HBASE-14383:
-------------------------------------------

[[email protected]]:
{quote}
I'd be for upping the max logs number (have seen cases where it ran away up to 
the thousands so some guard would be good)
{quote}

That is very degenerate case. I have thought about this, it is possible to have 
many CF in a table and very small flush files. By default, flush policy ignores 
all files less than 15MB. Imagine that all your files in a region's memstores  
selected for flushing less than 15MB => there will be no flush and WAL numbers 
will continue growing (indefinitely, by the way).

We probably need  *hbase.regionserver.maxlogs* as a safeguard against runaway 
wals during prolonged burst load, when ingested data per RS  in a PMF flush 
interval (1h) is much greater than overall memstore capacity. I agree we have 
to up default value of *hbase.regionserver.maxlogs* but set during RS init and 
not statically. We have to make sure that overall WAL capacity is not less than 
overall memstore capacity. Ideally it should be large enough to make the event 
(max number exceeded) very rare.

MTTR depends not a max number of WAL files but on a current load and PMF 
interval.      

> Compaction improvements
> -----------------------
>
>                 Key: HBASE-14383
>                 URL: https://issues.apache.org/jira/browse/HBASE-14383
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>             Fix For: 2.0.0
>
>
> Still major issue in many production environments. The general recommendation 
> - disabling region splitting and major compactions to reduce unpredictable 
> IO/CPU spikes, especially during peak times and running them manually during 
> off peak times. Still do not resolve the issues completely.
> h3. Flush storms
> * rolling WAL events across cluster can be highly correlated, hence flushing 
> memstores, hence triggering minor compactions, that can be promoted to major 
> ones. These events are highly correlated in time if there is a balanced 
> write-load on the regions in a table.
> *  the same is true for memstore flushing due to periodic memstore flusher 
> operation. 
> Both above may produce *flush storms* which are as bad as *compaction 
> storms*. 
> What can be done here. We can spread these events over time by randomizing 
> (with jitter) several  config options:
> # hbase.regionserver.optionalcacheflushinterval
> # hbase.regionserver.flush.per.changes
> # hbase.regionserver.maxlogs   
> h3. ExploringCompactionPolicy max compaction size
> One more optimization can be added to ExploringCompactionPolicy. To limit 
> size of a compaction there is a config parameter one could use 
> hbase.hstore.compaction.max.size. It would be nice to have two separate 
> limits: for peak and off peak hours.
> h3. ExploringCompactionPolicy selection evaluation algorithm
> Too simple? Selection with more files always wins, selection of smaller size 
> wins if number of files is the same. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to