[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172751#comment-17172751 ]
Alexey Serbin commented on KUDU-3180: ------------------------------------- [~awong] put together a great summary of the recent discussion. I just want to add two cents from my side, hoping it might be useful (if it makes sense). It's about looking at this from a generic perspective, (i.e. ignorant of the compaction/flush implementation details :) ). During our recent discussion of this issue with [~awong] and [~granthenke], one observation was that using {{memory_size * time_since_last_flush}} as the simplest proxy for the cost function allows for easier comprehension (at least for me) of an alternative policy that takes into account both the size and the age of datasets to flush. The idea is to make sure that being under starvation of flushes/compactions compared with the rate of incoming updates, heavier chunks of data are more likely to be picked up for flushing/compacting than smaller ones, even if the smaller ones have been around for somewhat longer time. However, super-old tiny data chunks are also picked up eventually even if heavy updates arrive all the time. So, picking datasets with the highest values of the cost function among those which cross a pre-set threshold might be a model to think of. As for doing compactions vs flushes, maybe it's possible to use a similar cost function but with 0.x something coefficient to reflect the notion that occupying disk storage is cheaper than occupying the same amount of RAM for the same time interval. > kudu don't always prefer to flush MRS/DMS that anchor more memory > ----------------------------------------------------------------- > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Bug > Reporter: YifanZhang > Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)