Todd Lipcon created KUDU-1567:
---------------------------------
Summary: Short default for log retention increases write
amplification
Key: KUDU-1567
URL: https://issues.apache.org/jira/browse/KUDU-1567
Project: Kudu
Issue Type: Improvement
Components: perf, tserver
Affects Versions: 0.10.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Currently the maintenance manager prioritizes flushes over compactions if the
flush operations are retaining WAL segments. The goal here is to prevent the
amount of in-memory data from getting so large that restarts would be
incredibly slow. However, it has a somewhat unintuitive negative effect on
performance:
- with the default of retaining just two segments, flushes become highly
prioritized when the MRS only has ~128MB of data, regardless of the
"flush_threshold_mb" configuration
- this creates lots of overlapping rowsets in the case of random-write
applications
- because flushes are prioritized over compactions, compactions rarely run
- the frequent flushes, combined with low priority of compactions, means that
after a few days of constant inserts, we often end up with average "bloom
lookups per op" metrics of 50-100, which is quite slow even if the blooms fit
in cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)