[ 
https://issues.apache.org/jira/browse/CASSANDRA-13747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16117564#comment-16117564
 ] 

Aleksey Yeschenko commented on CASSANDRA-13747:
-----------------------------------------------

Stack trace for the triggered assertion:

{code}
java.lang.AssertionError
        at 
org.apache.cassandra.service.DataResolver$ShortReadProtection$ShortReadRowProtection.moreContents(DataResolver.java:449)
        at 
org.apache.cassandra.service.DataResolver$ShortReadProtection$ShortReadRowProtection.moreContents(DataResolver.java:412)
        at 
org.apache.cassandra.db.transform.BaseIterator.tryGetMoreContents(BaseIterator.java:111)
        at 
org.apache.cassandra.db.transform.BaseIterator.hasMoreContents(BaseIterator.java:101)
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:155)
        at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:369)
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:189)
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:158)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:509)
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:369)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:129)
        at 
org.apache.cassandra.db.transform.FilteredRows.isEmpty(FilteredRows.java:50)
        at org.apache.cassandra.db.transform.Filter.closeIfEmpty(Filter.java:73)
        at 
org.apache.cassandra.db.transform.Filter.applyToPartition(Filter.java:43)
        at 
org.apache.cassandra.db.transform.Filter.applyToPartition(Filter.java:26)
        at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
        at 
org.apache.cassandra.service.StorageProxy$SingleRangeResponse.computeNext(StorageProxy.java:2200)
        at 
org.apache.cassandra.service.StorageProxy$SingleRangeResponse.computeNext(StorageProxy.java:2172)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
        at 
org.apache.cassandra.service.StorageProxy$RangeCommandIterator.computeNext(StorageProxy.java:2243)
        at 
org.apache.cassandra.service.StorageProxy$RangeCommandIterator.computeNext(StorageProxy.java:2210)
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
        at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
        at 
org.apache.cassandra.cql3.statements.SelectStatement.process(SelectStatement.java:707)
{code}

> Fix short read protection
> -------------------------
>
>                 Key: CASSANDRA-13747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13747
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Aleksey Yeschenko
>            Assignee: Aleksey Yeschenko
>             Fix For: 3.0.x, 3.11.x
>
>
> {{ShortReadRowProtection.moreContents()}} expects that by the time we get to 
> that method, the global post-reconciliation counter was already applied to 
> the current partition. However, sometimes it won’t get applied, and the 
> global counter continues counting with {{rowInCurrentPartition}} value not 
> reset from previous partition, which in the most obvious case would trigger 
> the assertion we are observing - {{assert 
> !postReconciliationCounter.isDoneForPartition();}}. In other cases it’s 
> possible because of this lack of reset to query a node for too few extra 
> rows, causing unnecessary SRP data requests.
> Why is the counter not always applied to the current partition?
> The merged {{PartitionIterator}} returned from {{DataResolver.resolve()}} has 
> two transformations applied to it, in the following order:
> {{Filter}} - to purge non-live data from partitions, and to discard empty 
> partitions altogether (except for Thrift)
> {{Counter}}, to count and stop iteration
> Problem is, {{Filter}} ’s {{applyToPartition()}} code that discards empty 
> partitions ({{closeIfEmpty()}} method) would sometimes consume the iterator, 
> triggering short read protection *before* {{Counter}} ’s 
> {{applyToPartition()}} gets called and resets its {{rowInCurrentPartition}} 
> sub-counter.
> We should not be consuming iterators until all transformations are applied to 
> them. For transformations it means that they cannot consume iterators unless 
> they are the last transformation on the stack.
> The linked branch fixes the problem by splitting {{Filter}} into two 
> transformations. The original - {{Filter}} - that does filtering within 
> partitions - and a separate {{EmptyPartitionsDiscarder}}, that discards empty 
> partitions from {{PartitionIterators}}. Thus {{DataResolve.resolve()}}, when 
> constructing its {{PartitionIterator}}, now does merge first, then applies 
> {{Filter}}, then {{Counter}}, and only then, as its last (third) 
> transformation - the {{EmptyPartitionsDiscarder}}. Being the last one 
> applied, it’s legal for it to consume the iterator, and triggering 
> {{moreContents()}} is now no longer a problem.
> Fixes: [3.0|https://github.com/iamaleksey/cassandra/commits/13747-3.0], 
> [3.11|https://github.com/iamaleksey/cassandra/commits/13747-3.11], 
> [4.0|https://github.com/iamaleksey/cassandra/commits/13747-4.0]. dtest 
> [here|https://github.com/iamaleksey/cassandra-dtest/commits/13747].



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to