[ https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171714#comment-17171714 ]
Andrew Wong edited comment on KUDU-3180 at 8/5/20, 7:36 PM: ------------------------------------------------------------ I've been discussing with [~aserbin] and [~granthenke] about this problem, and one thing that stands out about the issue here is that it isn't obvious what quantifiable values we should optimize for here. I think there are a few things to care about: * Insert/update performance * Memory used by mem-stores * Space anchored by WALs * To some extent, write amplification and size of output disk-stores These values don't explicitly trade off with one another, which makes it a bit difficult to determine the correct heuristic for when to flush mem-stores. Some different solutions we've been discussing are: * Defining some cost function based on the time since last flush AND memory used. This might be an improvement over today's policy, which uses a simple branching heuristic to pick based on time since last flush OR memory used. * Always using the WAL bytes anchored to determine what to flush. This has the benefit of somewhat taking into account both the time since last flush and memory used, in the sense that older mem-stores will tend to anchor more WAL bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has the added benefit of keeping the "space anchored by WALs" value in mind, so we don't end up with something like KUDU-3002. * Update the policy based on the current amount of space used / memory used to pick the "right" values to trade off. E.g. if we are running low on WAL disk space, prioritize based on WAL bytes anchored; if we are running low on memory, prioritize based on memory used, etc. Before exploring the solution space further, it'd be better to more clearly define the problem at hand. [~zhangyifan27] what are the values that look off to you? What tradeoffs would you prefer to make in filing this jira? Would something as simple as lowering {{-flush_threshold_mb}} or increasing {{-flush_threshold_secs}} help you? was (Author: andrew.wong): I've been discussing with [~aserbin] and [~granthenke] about this problem, and one thing that stands out about the issue here is that it isn't obvious what quantifiable values we should optimize for here. I think there are a few things to care about: * Insert/update performance * Memory used by mem-stores * Space anchored by WALs * To some extent, write amplification and size of output disk-stores These values don't explicitly trade off with one another, which makes it a bit difficult to determine the correct heuristic for when to flush mem-stores. Some different solutions we've been discussing are: * Defining some cost function based on the time since last flush AND memory used. This might be an improvement over today's policy, which uses a simple branching heuristic to pick based on time since last flush OR memory used. * Always using the WAL bytes anchored to determine what to flush. This has the benefit of somewhat taking into account both the time since last flush and memory used, in the sense that older mem-stores will tend to anchor more WAL bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has the added benefit of keeping the "space anchored by WALs" value in mind, so we don't end up with something like KUDU-3002. * Update the policy based on the current amount of space used / memory used to pick the "right" values to trade off. E.g. if we are running low on WAL disk space, prioritize based on WAL bytes anchored; if we are running low on memory, prioritize based on memory used, etc. Before exploring the solution space further, it'd be better to more clearly define the problem at hand. [~zhangyifan27] what are the values that look off to you? What tradeoffs would you prefer to make in filing this jira? Would something as simple as lowering {{-flush_threshold_mb}} or increasing {{-flush_threshold_secs}} help you? > kudu don't always prefer to flush MRS/DMS that anchor more memory > ----------------------------------------------------------------- > > Key: KUDU-3180 > URL: https://issues.apache.org/jira/browse/KUDU-3180 > Project: Kudu > Issue Type: Bug > Reporter: YifanZhang > Priority: Major > Attachments: image-2020-08-04-20-26-53-749.png, > image-2020-08-04-20-28-00-665.png > > > Current time-based flush policy always give a flush op a high score if we > haven't flushed for the tablet in a long time, that may lead to starvation of > ops that could free more memory. > We set -flush_threshold_mb=32, -flush_threshold_secs=1800 in a cluster, and > find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS > flushes and compactions, which seems not so reasonable. > !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327! -- This message was sent by Atlassian Jira (v8.3.4#803005)