Todd Lipcon created KUDU-1567:
---------------------------------

             Summary: Short default for log retention increases write 
amplification
                 Key: KUDU-1567
                 URL: https://issues.apache.org/jira/browse/KUDU-1567
             Project: Kudu
          Issue Type: Improvement
          Components: perf, tserver
    Affects Versions: 0.10.0
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon


Currently the maintenance manager prioritizes flushes over compactions if the 
flush operations are retaining WAL segments. The goal here is to prevent the 
amount of in-memory data from getting so large that restarts would be 
incredibly slow. However, it has a somewhat unintuitive negative effect on 
performance:
- with the default of retaining just two segments, flushes become highly 
prioritized when the MRS only has ~128MB of data, regardless of the 
"flush_threshold_mb" configuration
- this creates lots of overlapping rowsets in the case of random-write 
applications
- because flushes are prioritized over compactions, compactions rarely run
- the frequent flushes, combined with low priority of compactions, means that 
after a few days of constant inserts, we often end up with average "bloom 
lookups per op" metrics of 50-100, which is quite slow even if the blooms fit 
in cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to