[
https://issues.apache.org/jira/browse/KUDU-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15505252#comment-15505252
]
Todd Lipcon commented on KUDU-1567:
-----------------------------------
Put some work on this here: https://gerrit.cloudera.org/#/c/4470/
> Short default for log retention increases write amplification
> -------------------------------------------------------------
>
> Key: KUDU-1567
> URL: https://issues.apache.org/jira/browse/KUDU-1567
> Project: Kudu
> Issue Type: Improvement
> Components: perf, tserver
> Affects Versions: 0.10.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> Currently the maintenance manager prioritizes flushes over compactions if the
> flush operations are retaining WAL segments. The goal here is to prevent the
> amount of in-memory data from getting so large that restarts would be
> incredibly slow. However, it has a somewhat unintuitive negative effect on
> performance:
> - with the default of retaining just two segments, flushes become highly
> prioritized when the MRS only has ~128MB of data, regardless of the
> "flush_threshold_mb" configuration
> - this creates lots of overlapping rowsets in the case of random-write
> applications
> - because flushes are prioritized over compactions, compactions rarely run
> - the frequent flushes, combined with low priority of compactions, means that
> after a few days of constant inserts, we often end up with average "bloom
> lookups per op" metrics of 50-100, which is quite slow even if the blooms fit
> in cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)