[ 
https://issues.apache.org/jira/browse/CASSANDRA-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14585740#comment-14585740
 ] 

Stefania commented on CASSANDRA-9591:
-------------------------------------

The patch looks OK so far, I've pushed some minor changes in a separate commit 
on [this branch | https://github.com/stef1927/cassandra/commits/9591-2.0]. It's 
pretty straightforward stuff but do let me know if you have any concerns.

I have also started the integration to 2.1 on [this branch | 
https://github.com/stef1927/cassandra/commits/9591-2.1]. Unfortunately the code 
in SSTableReader is a bit divergent so it's probably not working in 2.1 right 
now.

These two branches will be picked up by our continuous integration server, the 
results will (eventually) be available 
[here|http://cassci.datastax.com/view/Dev/view/stef1927]. I will check tomorrow 
for any broken tests.

I've added a very basic unit test. However we need to add more tests, probably 
in _scrub_test.py_ of [dtests|https://github.com/riptano/cassandra-dtest]. Here 
we should test both standalone and nodetool scrub, albeit with an index in this 
latter case. I don't mind writing some more tests as this area is a bit lacking 
(we do have some tests for scrubbing secondary indexes but they are only active 
on >= 2.2). However do let me know if you want to write them yourself.

Once all the tests are clear, new and existing, I will try to find a committer, 
thanks for submitting the patch!


> Scrub (recover) sstables even when -Index.db is missing
> -------------------------------------------------------
>
>                 Key: CASSANDRA-9591
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9591
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: mck
>            Assignee: mck
>              Labels: sstablescrub
>             Fix For: 2.0.x
>
>         Attachments: 9591-2.0.txt
>
>
> Today SSTableReader needs at minimum 3 files to load an sstable:
>  - -Data.db
>  - -CompressionInfo.db 
>  - -Index.db
> But during the scrub process the -Index.db file isn't actually necessary, 
> unless there's corruption in the -Data.db and we want to be able to skip over 
> corrupted rows. Given that there is still a fair chance that there's nothing 
> wrong with the -Data.db file and we're just missing the -Index.db file this 
> patch addresses that situation.
> So the following patch makes it possible for the StandaloneScrubber 
> (sstablescrub) to recover sstables despite missing -Index.db files.
> This can happen from a catastrophic incident where data directories have been 
> lost and/or corrupted, or wiped and the backup not healthy. I'm aware that 
> normally one depends on replicas or snapshots to avoid such situations, but 
> such catastrophic incidents do occur in the wild.
> I have not tested this patch against normal c* operations and all the other 
> (more critical) ways SSTableReader is used. i'll happily do that and add the 
> needed units tests if people see merit in accepting the patch.
> Otherwise the patch can live with the issue, in-case anyone else needs it. 
> There's also a cassandra distribution bundled with the patch 
> [here|https://github.com/michaelsembwever/cassandra/releases/download/2.0.15-recover-sstables-without-indexdb/apache-cassandra-2.0.15-recover-sstables-without-indexdb.tar.gz]
>  to make life a little easier for anyone finding themselves in such a bad 
> situation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to