[jira] [Commented] (CASSANDRA-14532) Partition level deletions past GCGS are not propagated/merged on read

Kurt Greaves (JIRA) Thu, 21 Jun 2018 16:26:47 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519843#comment-16519843
 ]


Kurt Greaves commented on CASSANDRA-14532:
------------------------------------------

bq. That's not really a bug in that this is working as designed. The very 
reason for gc grace is to be a long enough time that we can guarantee any data 
(including tombstone) has been propagated to all replica, and that's why you 
must run repair within the gc grace window (otherwise other mechanism don't 
truly guarantee that). So we should not have to propagate anything past gcgs 
and doing so is at best an inefficiency.

Yeah we've come to that conclusion on the ML already. I haven't updated here 
but I think this ticket really becomes a case of documenting the behaviour a 
bit more explicitly. It's pretty hard to reason through, and having already 
reasoned through it in the past and forgotten, and that it's happened to other 
people seems to make this a good case for documentation. I'd like to put a 
better explanation in the code, but more importantly, in the docs somewhere. 
I'll look at that in the near future, but regarding docs we don't really have 
much (anything?) on tombstones at the moment and it would be terribly out of 
place to document this without a whole lot of general/background knowledge on 
tombstones. Anyway, I'll get around to writing that up as well at some point...

bq. That would be a real bug, though - worth opening a JIRA there.

bq. Possibly. But remember that being post GCGS is not the only condition on 
tombstone for them to be purged: compaction also needs to be able to prove the 
tombstone cannot shadow something in another, non-compacted, sstable. So the 
sentence is not precise enough to say this is a bug for sure either.

So I'll get exact reproduce steps sorted, but I left out some details in my 
original comment which makes it significantly more minor. In this case what we 
see is that you can create a single SSTable containing only a partition 
deletion with no shadowed data in any other SSTable and when you compact the 
tombstone won't be removed. However, as soon as you insert more data that 
shadows the tombstone and compact with the SSTable it will be removed (there 
may be more to it than this, have to test again).

Either way we figured that's not a major bug in the scheme of things, but I'll 
get a test case up and running so can investigate a bit more.


> Partition level deletions past GCGS are not propagated/merged on read
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-14532
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14532
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Kurt Greaves
>            Assignee: Kurt Greaves
>            Priority: Major
>
> So as [~jay.zhuang] mentioned on the mailing list 
> [here|http://mail-archives.us.apache.org/mod_mbox/cassandra-dev/201806.mbox/<CAAXszS0%3DmCu5ptDccki_coxRwwF0ZFrTYs_EJLpMTDjNT3tFSA%40mail.gmail.com>],
>  it appears that partition deletions that have passed GCGS are not 
> propagated/merged properly on read, and also not repaired via read repair.
> Steps to reproduce:
> {code}
> create keyspace test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 3};
> create table test.test (id int PRIMARY KEY , data text) WITH gc_grace_seconds 
> = 10;
> CONSISTENCY ALL;
> INSERT INTO test.test (id, data) values (1, 'test');
> ccm node2 stop
> CONSISTENCY QUORUM;
> DELETE from test.test where id = 1; // wait 10 seconds so HH doesn't 
> propagate tombstone when starting node2
> select * from test.test where id = 1 ;
>  id | data
> ----+------
> (0 rows)
> ccm node2 start
> CONSISTENCY ALL;
> select * from test.test where id = 1 ;
>  id | data
> ----+------
>   1 | test
> alter table test.test WITH gc_grace_seconds = 100000; // GC
> select * from test.test where id = 1 ;
>  id | data
> ----+------
> (0 rows)
> {code}
> We've also found a seemingly related issue in compaction where trying to 
> compact an SSTable which contains the partition deletion post GCGS, the 
> partition deletion won't be removed via compaction. Likely the same code is 
> causing both bugs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-14532) Partition level deletions past GCGS are not propagated/merged on read

Reply via email to