[jira] [Comment Edited] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams

ZhaoYang (JIRA) Wed, 27 Sep 2017 22:51:06 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16134733#comment-16134733
 ]


ZhaoYang edited comment on CASSANDRA-13299 at 9/28/17 5:49 AM:
---------------------------------------------------------------

[trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13299-trunk]
[dtest|https://github.com/jasonstack/cassandra-dtest/commits/CASSANDRA-13299 ]

{code}
Changes:

1. Throttle by number of base unfiltered. default is 100. 

2. A pair of open/close range tombstone could have any number of unshadowed 
rows in between. In the patch, when reaching the limit of each batch, if there 
is an open range-tombstone-mark, it will generate a corresponding close marker. 
It's to avoid handling range-tombstone-mark separately from row which costs 1 
more read-before-write for each pair of markers. This also help to reduce the 
impact of a large range tombstone.

3. Partition deletion is only applied on first mutation to avoid reading entire 
partition more than once.
{code}

Note:
One partition deletion or a range deletion could still cause huge number of 
view rows to be removed, thus view mutation may fail to apply due to WTE or 
max_mutation_size, but it could be resolved separately in CASSANDRA-12783. 
Here, I only address the issue of holding entire partition into memory when 
repairing base with mv.

Cherry-pick CASSANDRA-13787 to pass dtest


was (Author: jasonstack):
[trunk|https://github.com/jasonstack/cassandra/commits/CASSANDRA-13299-trunk]
[dtest|https://github.com/riptano/cassandra-dtest/commits/CASSANDRA-13299 ]

{code}
Changes:

1. Throttle by number of base unfiltered. default is 100. 

2. A pair of open/close range tombstone could have any number of unshadowed 
rows in between. In the patch, when reaching the limit of each batch, if there 
is an open range-tombstone-mark, it will generate a corresponding close marker. 
It's to avoid handling range-tombstone-mark separately from row which costs 1 
more read-before-write for each pair of markers. This also help to reduce the 
impact of a large range tombstone.

3. Partition deletion is only applied on first mutation to avoid reading entire 
partition more than once.
{code}

Note:
One partition deletion or a range deletion could still cause huge number of 
view rows to be removed, thus view mutation may fail to apply due to WTE or 
max_mutation_size, but it could be resolved separately in CASSANDRA-12783. 
Here, I only address the issue of holding entire partition into memory when 
repairing base with mv.

Cherry-pick CASSANDRA-13787 to pass dtest

> Potential OOMs and lock contention in write path streams
> --------------------------------------------------------
>
>                 Key: CASSANDRA-13299
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13299
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Materialized Views
>            Reporter: Benjamin Roth
>            Assignee: ZhaoYang
>             Fix For: 4.x
>
>
> I see a potential OOM, when a stream (e.g. repair) goes through the write 
> path as it is with MVs.
> StreamReceiveTask gets a bunch of SSTableReaders. These produce rowiterators 
> and they again produce mutations. So every partition creates a single 
> mutation, which in case of (very) big partitions can result in (very) big 
> mutations. Those are created on heap and stay there until they finished 
> processing.
> I don't think it is necessary to create a single mutation for each partition. 
> Why don't we implement a PartitionUpdateGeneratorIterator that takes a 
> UnfilteredRowIterator and a max size and spits out PartitionUpdates to be 
> used to create and apply mutations?
> The max size should be something like min(reasonable_absolute_max_size, 
> max_mutation_size, commitlog_segment_size / 2). reasonable_absolute_max_size 
> could be like 16M or sth.
> A mutation shouldn't be too large as it also affects MV partition locking. 
> The longer a MV partition is locked during a stream, the higher chances are 
> that WTE's occur during streams.
> I could also imagine that a max number of updates per mutation regardless of 
> size in bytes could make sense to avoid lock contention.
> Love to get feedback and suggestions, incl. naming suggestions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-13299) Potential OOMs and lock contention in write path streams

Reply via email to