[
https://issues.apache.org/jira/browse/HBASE-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746783#comment-14746783
]
Vladimir Rodionov commented on HBASE-14383:
-------------------------------------------
[[email protected]]:
{quote}
I'd be for upping the max logs number (have seen cases where it ran away up to
the thousands so some guard would be good)
{quote}
That is very degenerate case. I have thought about this, it is possible to have
many CF in a table and very small flush files. By default, flush policy ignores
all files less than 15MB. Imagine that all your files in a region's memstores
selected for flushing less than 15MB => there will be no flush and WAL numbers
will continue growing (indefinitely, by the way).
We probably need *hbase.regionserver.maxlogs* as a safeguard against runaway
wals during prolonged burst load, when ingested data per RS in a PMF flush
interval (1h) is much greater than overall memstore capacity. I agree we have
to up default value of *hbase.regionserver.maxlogs* but set during RS init and
not statically. We have to make sure that overall WAL capacity is not less than
overall memstore capacity. Ideally it should be large enough to make the event
(max number exceeded) very rare.
MTTR depends not a max number of WAL files but on a current load and PMF
interval.
> Compaction improvements
> -----------------------
>
> Key: HBASE-14383
> URL: https://issues.apache.org/jira/browse/HBASE-14383
> Project: HBase
> Issue Type: Improvement
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
> Fix For: 2.0.0
>
>
> Still major issue in many production environments. The general recommendation
> - disabling region splitting and major compactions to reduce unpredictable
> IO/CPU spikes, especially during peak times and running them manually during
> off peak times. Still do not resolve the issues completely.
> h3. Flush storms
> * rolling WAL events across cluster can be highly correlated, hence flushing
> memstores, hence triggering minor compactions, that can be promoted to major
> ones. These events are highly correlated in time if there is a balanced
> write-load on the regions in a table.
> * the same is true for memstore flushing due to periodic memstore flusher
> operation.
> Both above may produce *flush storms* which are as bad as *compaction
> storms*.
> What can be done here. We can spread these events over time by randomizing
> (with jitter) several config options:
> # hbase.regionserver.optionalcacheflushinterval
> # hbase.regionserver.flush.per.changes
> # hbase.regionserver.maxlogs
> h3. ExploringCompactionPolicy max compaction size
> One more optimization can be added to ExploringCompactionPolicy. To limit
> size of a compaction there is a config parameter one could use
> hbase.hstore.compaction.max.size. It would be nice to have two separate
> limits: for peak and off peak hours.
> h3. ExploringCompactionPolicy selection evaluation algorithm
> Too simple? Selection with more files always wins, selection of smaller size
> wins if number of files is the same.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)