[jira] [Updated] (CASSANDRA-7872) ensure compacted obsolete sstables are not open on node restart and nodetool refresh, even on sstable reference miscounting or deletion tasks are failed.

Oleg Anastasyev (JIRA) Mon, 03 Nov 2014 09:31:59 -0800

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Oleg Anastasyev updated CASSANDRA-7872:
---------------------------------------
    Attachment: 7872-v2.0-robustness.txt
                7872-v2.0-bugdetector.txt
                7872-v2.0-NoPhQ.txt

OK, split it to 3 patches. They could be applied in order:

7872-v2.0-NoPhQ.txt - is all with no Phantom Queues. I.e. it has Compacted 
marker and removal of obsolete sstables on startup and nodetool refresh.

7872-v2.0-bugdetector.txt - has Phantom Queue to detect bugs in refcount (and 
whatever else) sstable tracking algorithm. I added additional 10 seconds pause 
before checking file existence on disk and alerting about bug found in refcount 
algorithm.

7872-v2.0-robustness.txt - I left the original sstable removal code here. I'd 
rather not call resilient action on the known detected failure "papering out of 
a bug", so this one could be of use for ppl who want stuck sstable files 
removed from disk with no restart neccessary. This could be applied on top of 
*-bugdetector.txt

> ensure compacted obsolete sstables are not open on node restart and nodetool 
> refresh, even on sstable reference miscounting or deletion tasks are failed.
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7872
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7872
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Oleg Anastasyev
>            Assignee: Oleg Anastasyev
>             Fix For: 2.0.12
>
>         Attachments: 7872-v2.0-NoPhQ.txt, 7872-v2.0-bugdetector.txt, 
> 7872-v2.0-robustness.txt, EnsureNoObsoleteSSTables-7872-v2.0.txt
>
>
> Since CASSANDRA-4436 compacted sstables are no more marked with 
> COMPACTED_MARKER file. Instead after they are compacted, DataTracker calls 
> SSTableReader.markObsolete(), but the actual deletion is happening later on 
> SSTableReader.releaseReference().
> This reference counting is very fragile, it is very easy to introduce a 
> hard-to-catch and rare bug, so this reference count never reaches 0 ( like 
> CASSANDRA-6503 for example )
> This means, that very rarely obsolete sstable files are not removed from disk 
> (but are not used anymore by cassandra to read data).
> If more than gc grace time has passed since sstable file was not removed from 
> disk and operator issues either nodetool refresh or just reboots a node, 
> these obsolete files are being discovered and open for read by a node. So 
> deleted data is resurrected, being quickly spread by RR to whole cluster.
> Because consequences are very serious (even a single not removed obsolete 
> sstable file could render your data useless) this patch makes sure no 
> obsolete sstable file can be open for read by:
> 1. Removing sstables on CFS init analyzing sstable generations (sstable is 
> removed, if there are another sstable, listing this as ancestor)
> 2. Reimplementing COMPACTED_MARKER file for sstable. This marker is created 
> as soon as markObsolete is called. This is neccessary b/c generation info can 
> be lost (when sstables compact to none)
> 3. To remove sstables sooner then restart - reimplemented the good old GC 
> phantom reference queue as well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-7872) ensure compacted obsolete sstables are not open on node restart and nodetool refresh, even on sstable reference miscounting or deletion tasks are failed.

Reply via email to