[ 
https://issues.apache.org/jira/browse/KUDU-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3002:
------------------------------
    Description: 
When under memory pressure, we'll aggressively perform the maintenance 
operation that frees the most memory. Right now, the only ops that register 
memory are MRS and DMS flushes.

In practice, this means a couple things:
 * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
since updates are spread across many DMSs and will therefore tend to be small, 
whereas any non-trivial insert workload will well up into a single MRS for an 
entire tablet
 * We'll only flush a single DMS at a time to free memory. Because of this, and 
because we'll likely prioritize MRS flushes over DMS flushes, we may end up 
with a ton of tiny DMSs in a tablet that we'll never flush. This can end up 
bloating the WALs because each DMS may be anchoring some WAL segments.

A couple thoughts on small things we can do to improve this:
 * Register the DMS size as ram anchored by a compaction. This will meant that 
we can schedule compactions to flush DMSs en masse. This would still mean that 
we could end up always prioritizing MRS flushes, depending on how quickly we're 
inserting.
 * We currently register the amount disk space an LogGC would free up. We could 
do something similar, but register how many log anchors an op could release. 
This would be a bit trickier, since the log anchors aren't solely determined by 
the mem-stores (e.g. we'll anchor segments to catch up slow followers).
 * Introduce a new op (or change the flush DMS op) that would flush as many 
DMSs as we can for a given tablet.

Between these, the first seems like it'd be an easy win.

  was:
When under memory pressure, we'll aggressively perform the maintenance 
operation that frees the most memory. Right now, the only ops that register 
memory are MRS and DMS flushes.

In practice, this means a couple things:
 * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
since updates are spread across many DMSs and will therefore tend to be small, 
whereas any non-trivial insert workload will well up into a single MRS for an 
entire tablet
 * We'll only flush a single DMS at a time to free memory. Because of this, and 
because we'll likely prioritize MRS flushes over DMS flushes, we may end up 
with a ton of tiny DMSs in a tablet that we'll never flush. This can end up 
bloating the WALs because each DMS may be anchoring some WAL segments.

A couple thoughts on small things we can do to improve this:
 * Register the DMS size as ram anchored by a compaction. This will meant that 
we can schedule compactions to flush DMSs en masse. This would still mean that 
we could end up always prioritizing MRS flushes, depending on how quickly we're 
inserting.
 * We currently register the amount disk space an LogGC would free up. We could 
do something similar, but register how many log anchors an op could release. 
This would be a bit trickier, since the log anchors aren't solely determined by 
the mem-stores (e.g. we'll anchor segments to catch up slow followers).

Between the two, the first seems like it'd be an easy win.


> consider compactions as a mechanism to flush many DMSs
> ------------------------------------------------------
>
>                 Key: KUDU-3002
>                 URL: https://issues.apache.org/jira/browse/KUDU-3002
>             Project: Kudu
>          Issue Type: Improvement
>          Components: perf, tablet
>            Reporter: Andrew Wong
>            Priority: Major
>
> When under memory pressure, we'll aggressively perform the maintenance 
> operation that frees the most memory. Right now, the only ops that register 
> memory are MRS and DMS flushes.
> In practice, this means a couple things:
>  * In most cases, we'll prioritize flushing MRSs way ahead of flushing DMS, 
> since updates are spread across many DMSs and will therefore tend to be 
> small, whereas any non-trivial insert workload will well up into a single MRS 
> for an entire tablet
>  * We'll only flush a single DMS at a time to free memory. Because of this, 
> and because we'll likely prioritize MRS flushes over DMS flushes, we may end 
> up with a ton of tiny DMSs in a tablet that we'll never flush. This can end 
> up bloating the WALs because each DMS may be anchoring some WAL segments.
> A couple thoughts on small things we can do to improve this:
>  * Register the DMS size as ram anchored by a compaction. This will meant 
> that we can schedule compactions to flush DMSs en masse. This would still 
> mean that we could end up always prioritizing MRS flushes, depending on how 
> quickly we're inserting.
>  * We currently register the amount disk space an LogGC would free up. We 
> could do something similar, but register how many log anchors an op could 
> release. This would be a bit trickier, since the log anchors aren't solely 
> determined by the mem-stores (e.g. we'll anchor segments to catch up slow 
> followers).
>  * Introduce a new op (or change the flush DMS op) that would flush as many 
> DMSs as we can for a given tablet.
> Between these, the first seems like it'd be an easy win.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to