[ 
https://issues.apache.org/jira/browse/KUDU-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171714#comment-17171714
 ] 

Andrew Wong edited comment on KUDU-3180 at 8/5/20, 7:36 PM:
------------------------------------------------------------

I've been discussing with [~aserbin] and [~granthenke] about this problem, and 
one thing that stands out about the issue here is that it isn't obvious what 
quantifiable values we should optimize for here. I think there are a few things 
to care about:
 * Insert/update performance
 * Memory used by mem-stores
 * Space anchored by WALs
 * To some extent, write amplification and size of output disk-stores

These values don't explicitly trade off with one another, which makes it a bit 
difficult to determine the correct heuristic for when to flush mem-stores. Some 
different solutions we've been discussing are:
 * Defining some cost function based on the time since last flush AND memory 
used. This might be an improvement over today's policy, which uses a simple 
branching heuristic to pick based on time since last flush OR memory used.
 * Always using the WAL bytes anchored to determine what to flush. This has the 
benefit of somewhat taking into account both the time since last flush and 
memory used, in the sense that older mem-stores will tend to anchor more WAL 
bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has 
the added benefit of keeping the "space anchored by WALs" value in mind, so we 
don't end up with something like KUDU-3002.
 * Update the policy based on the current amount of space used / memory used to 
pick the "right" values to trade off. E.g. if we are running low on WAL disk 
space, prioritize based on WAL bytes anchored; if we are running low on memory, 
prioritize based on memory used, etc.

Before exploring the solution space further, it'd be better to more clearly 
define the problem at hand. [~zhangyifan27] what are the values that look off 
to you? What tradeoffs would you prefer to make in filing this jira? Would 
something as simple as lowering {{-flush_threshold_mb}} or increasing 
{{-flush_threshold_secs}} help you?


was (Author: andrew.wong):
I've been discussing with [~aserbin] and [~granthenke] about this problem, and 
one thing that stands out about the issue here is that it isn't obvious what 
quantifiable values we should optimize for here. I think there are a few things 
to care about:
 * Insert/update performance
 * Memory used by mem-stores
 * Space anchored by WALs
 * To some extent, write amplification and size of output disk-stores

These values don't explicitly trade off with one another, which makes it a bit 
difficult to determine the correct heuristic for when to flush mem-stores. Some 
different solutions we've been discussing are:
* Defining some cost function based on the time since last flush AND memory 
used. This might be an improvement over today's policy, which uses a simple 
branching heuristic to pick based on time since last flush OR memory used.
* Always using the WAL bytes anchored to determine what to flush. This has the 
benefit of somewhat taking into account both the time since last flush and 
memory used, in the sense that older mem-stores will tend to anchor more WAL 
bytes, and larger mem-stores will also tend to anchor more WAL bytes. This has 
the added benefit of keeping the "space anchored by WALs" value in mind, so we 
don't end up with something like KUDU-3002.
* Update the policy based on the current amount of space used / memory used to 
pick the "right" values to trade off. E.g. if we are running low on WAL disk 
space, prioritize based on WAL bytes anchored; if we are running low on memory, 
prioritize based on memory used, etc.

Before exploring the solution space further, it'd be better to more clearly 
define the problem at hand. [~zhangyifan27] what are the values that look off 
to you? What tradeoffs would you prefer to make in filing this jira? Would 
something as simple as lowering {{-flush_threshold_mb}} or increasing 
{{-flush_threshold_secs}} help you?

> kudu don't always prefer to flush MRS/DMS that anchor more memory
> -----------------------------------------------------------------
>
>                 Key: KUDU-3180
>                 URL: https://issues.apache.org/jira/browse/KUDU-3180
>             Project: Kudu
>          Issue Type: Bug
>            Reporter: YifanZhang
>            Priority: Major
>         Attachments: image-2020-08-04-20-26-53-749.png, 
> image-2020-08-04-20-28-00-665.png
>
>
> Current time-based flush policy always give a flush op a high score if we 
> haven't flushed for the tablet in a long time, that may lead to starvation of 
> ops that could free more memory.
> We set  -flush_threshold_mb=32,  -flush_threshold_secs=1800 in a cluster, and 
> find that some small MRS/DMS flushes has a higher perf score than big MRS/DMS 
> flushes and compactions, which seems not so reasonable.
> !image-2020-08-04-20-26-53-749.png|width=1424,height=317!!image-2020-08-04-20-28-00-665.png|width=1414,height=327!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to