[ https://issues.apache.org/jira/browse/CASSANDRA-14532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519843#comment-16519843 ]
Kurt Greaves commented on CASSANDRA-14532: ------------------------------------------ bq. That's not really a bug in that this is working as designed. The very reason for gc grace is to be a long enough time that we can guarantee any data (including tombstone) has been propagated to all replica, and that's why you must run repair within the gc grace window (otherwise other mechanism don't truly guarantee that). So we should not have to propagate anything past gcgs and doing so is at best an inefficiency. Yeah we've come to that conclusion on the ML already. I haven't updated here but I think this ticket really becomes a case of documenting the behaviour a bit more explicitly. It's pretty hard to reason through, and having already reasoned through it in the past and forgotten, and that it's happened to other people seems to make this a good case for documentation. I'd like to put a better explanation in the code, but more importantly, in the docs somewhere. I'll look at that in the near future, but regarding docs we don't really have much (anything?) on tombstones at the moment and it would be terribly out of place to document this without a whole lot of general/background knowledge on tombstones. Anyway, I'll get around to writing that up as well at some point... bq. That would be a real bug, though - worth opening a JIRA there. bq. Possibly. But remember that being post GCGS is not the only condition on tombstone for them to be purged: compaction also needs to be able to prove the tombstone cannot shadow something in another, non-compacted, sstable. So the sentence is not precise enough to say this is a bug for sure either. So I'll get exact reproduce steps sorted, but I left out some details in my original comment which makes it significantly more minor. In this case what we see is that you can create a single SSTable containing only a partition deletion with no shadowed data in any other SSTable and when you compact the tombstone won't be removed. However, as soon as you insert more data that shadows the tombstone and compact with the SSTable it will be removed (there may be more to it than this, have to test again). Either way we figured that's not a major bug in the scheme of things, but I'll get a test case up and running so can investigate a bit more. > Partition level deletions past GCGS are not propagated/merged on read > --------------------------------------------------------------------- > > Key: CASSANDRA-14532 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14532 > Project: Cassandra > Issue Type: Bug > Reporter: Kurt Greaves > Assignee: Kurt Greaves > Priority: Major > > So as [~jay.zhuang] mentioned on the mailing list > [here|http://mail-archives.us.apache.org/mod_mbox/cassandra-dev/201806.mbox/<CAAXszS0%3DmCu5ptDccki_coxRwwF0ZFrTYs_EJLpMTDjNT3tFSA%40mail.gmail.com>], > it appears that partition deletions that have passed GCGS are not > propagated/merged properly on read, and also not repaired via read repair. > Steps to reproduce: > {code} > create keyspace test WITH replication = {'class': 'SimpleStrategy', > 'replication_factor': 3}; > create table test.test (id int PRIMARY KEY , data text) WITH gc_grace_seconds > = 10; > CONSISTENCY ALL; > INSERT INTO test.test (id, data) values (1, 'test'); > ccm node2 stop > CONSISTENCY QUORUM; > DELETE from test.test where id = 1; // wait 10 seconds so HH doesn't > propagate tombstone when starting node2 > select * from test.test where id = 1 ; > id | data > ----+------ > (0 rows) > ccm node2 start > CONSISTENCY ALL; > select * from test.test where id = 1 ; > id | data > ----+------ > 1 | test > alter table test.test WITH gc_grace_seconds = 100000; // GC > select * from test.test where id = 1 ; > id | data > ----+------ > (0 rows) > {code} > We've also found a seemingly related issue in compaction where trying to > compact an SSTable which contains the partition deletion post GCGS, the > partition deletion won't be removed via compaction. Likely the same code is > causing both bugs. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org