[ 
https://issues.apache.org/jira/browse/CASSANDRA-8038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179628#comment-14179628
 ] 

Takenori Sato commented on CASSANDRA-8038:
------------------------------------------

{quote}
Several places.
One of them RowDataResolve#resolve() -> resolveSuperset() -> 
filter.collateColumns() -> ...
{quote}

Thanks for pointing out.

I have looked through RowDataResolver carefully.

It finds a resolved version out of multiple versions collected from replica 
nodes. It uses an instance of IdentityQueryFilter for filtering, which is a 
subclass of SliceQueryFilter. And it follows the same path of slice query that 
finds a resolved version out of multiple versions from sstables, which is fixed 
by this patch.

The problem I noticed is that an obsolete data could be returned to a client in 
Case1-2 as below.

{code:title=scenario|borderStyle=solid}
RF=3
Node X
Node Y
Node Z
Coordinator Node C 

Case 1: ignoreTombstonesForReadRepair=true for X, Y, Z

Case 1-1
X[col1:tombstone] ==> [] ==> C
Y[col1:tombstone] ==> [] ==> C
Z[col1:tombstone] ==> [] ==> C
C ==> [] ===> Client

Case 1-2
X[col1:tombstone] ==> [] ==> C
Y[col1:old] ==> [col1:old] ==> C
Z[col1:tombstone] ==> [] ==> C
C ==> [col1:old] ===> Client
C ==> [col1:old] ===> X, Z(stored locally, but tombstone wins)

Case 1-3
X[col1:new] ==> [new] ==> C
Y[col1:tombstone] ==> [] ==> C
Z[col1:new] ==> [new] ==> C
C ==> [new] ===> Client
C ==> [new] ===> Y

Case 2: ignoreTombstonesForReadRepair=true for X, Y, and false for Z

Case 2-1
X[col1:tombstone] ==> [] ==> C
Y[col1:tombstone] ==> [] ==> C
Z[col1:tombstone] ==> [tombstone] ==> C
C ==> [] ===> Client

Case 2-2
X[col1:tombstone] ==> [] ==> C
Y[col1:old] ==> [col1:old] ==> C
Z[col1:tombstone] ==> [tombstone] ==> C
C ==> [] ===> Client
C ==> [tombstone] ===> X, Y

Case 2-3
X[col1:new] ==> [new] ==> C
Y[col1:tombstone] ==> [] ==> C
Z[col1:new] ==> [new] ==> C
C ==> [new] ===> Client
C ==> [new] ===> Y

Case 3: ignoreTombstonesForReadRepair=true for X, Z, and false for Y

Case 3-1
X[col1:tombstone] ==> [] ==> C
Y[col1:tombstone] ==> [tombstone] ==> C
Z[col1:tombstone] ==> [] ==> C
C ==> [] ===> Client

Case 3-2
X[col1:tombstone] ==> [] ==> C
Y[col1:old] ==> [col1:old] ==> C
Z[col1:tombstone] ==> [] ==> C
C ==> [] ===> Client
C ==> [tombstone] ===> X, Y

Case 3-3
X[col1:new] ==> [new] ==> C
Y[col1:tombstone] ==> [tombstone] ==> C
Z[col1:new] ==> [new] ==> C
C ==> [new] ===> Client
C ==> [new] ===> Y

Case 4: ignoreTombstonesForReadRepair=true for Y, Z, and false for X

Case 4-1
X[col1:tombstone] ==> [tombstone] ==> C
Y[col1:tombstone] ==> [] ==> C
Z[col1:tombstone] ==> [] ==> C
C ==> [] ===> Client

Case 4-2
X[col1:tombstone] ==> [tombstone] ==> C
Y[col1:old] ==> [col1:old] ==> C
Z[col1:tombstone] ==> [] ==> C
C ==> [] ===> Client
C ==> [tombstone] ===> Y, Z

Case 4-3
X[col1:new] ==> [new] ==> C
Y[col1:tombstone] ==> [tombstone] ==> C
Z[col1:new] ==> [new] ==> C
C ==> [new] ===> Client
C ==> [new] ===> Y
{code}

> A new config option to ignore column tombstones for RR or not
> -------------------------------------------------------------
>
>                 Key: CASSANDRA-8038
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8038
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Takenori Sato
>             Fix For: 2.1.2
>
>         Attachments: CASSANDRA-8038-v2.txt, CASSANDRA-8038-v3.txt, 
> CASSANDRA-8038.txt
>
>
> CASSANDRA-6117 addressed the death of Cassandra by column tombstones, and 
> whose fix was to raise an error when reading more tombstones than a 
> threshold. I think it is an emergency action, rather than a fix.
> We have had this issue for long. So I wondered, in the first place, if it is 
> really necessary to collect non-gc-able tombstones, which could cause 
> concurrent mode failures, and OOM eventually? 
> Actually, I was surprised by the fact that Cassandra takes them into 
> consideration. Rather, I prefer to raise a threshold, and tell Cassandra to 
> ignore tombstones for digest calculation of RR because a repair is running 
> regularly. 
> I guess there are some people like me, but not all. So what about adding a 
> new configuration option if Cassandra ignores column tombstones for RR or not?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to