[
https://issues.apache.org/jira/browse/KUDU-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457066#comment-15457066
]
Todd Lipcon commented on KUDU-1567:
-----------------------------------
Another thought: would be good to change the retention behavior to support the
following:
- on an actively written tablet, don't worry about going up to 10-20 log
segments. If someone restarts in the middle of a heavy write workload, it's
probably more expected for those tablets to recover slowly.
- when the tablet has flushed due to time reasons and no longer needs all of
those log segments, we should delete them rather than adhering to some
arbitrary "min segments"
In other words, the user configuration should be to set a target size (a soft
upper bound) for the logs that need to be replayed, but not a lower bound of
logs which are kept for no good reason.
> Short default for log retention increases write amplification
> -------------------------------------------------------------
>
> Key: KUDU-1567
> URL: https://issues.apache.org/jira/browse/KUDU-1567
> Project: Kudu
> Issue Type: Improvement
> Components: perf, tserver
> Affects Versions: 0.10.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> Currently the maintenance manager prioritizes flushes over compactions if the
> flush operations are retaining WAL segments. The goal here is to prevent the
> amount of in-memory data from getting so large that restarts would be
> incredibly slow. However, it has a somewhat unintuitive negative effect on
> performance:
> - with the default of retaining just two segments, flushes become highly
> prioritized when the MRS only has ~128MB of data, regardless of the
> "flush_threshold_mb" configuration
> - this creates lots of overlapping rowsets in the case of random-write
> applications
> - because flushes are prioritized over compactions, compactions rarely run
> - the frequent flushes, combined with low priority of compactions, means that
> after a few days of constant inserts, we often end up with average "bloom
> lookups per op" metrics of 50-100, which is quite slow even if the blooms fit
> in cache.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)