Re: Possibly losing data with corrupted SSTables

sankalp kohli Wed, 12 Feb 2014 18:17:09 -0800

You might want to look at this JIRA i filed today
CASSANDRA-6696 <https://issues.apache.org/jira/browse/CASSANDRA-6696>


You are good if you are fine with data reappearing.


On Wed, Feb 12, 2014 at 9:20 AM, Francisco Nogueira Calmon Sobral <
fsob...@igcorp.com.br> wrote:

> Hi, Rahul.
>
> I've removed the corrupted sstables and 'nodetool repair' ran successfully
> for the column family. I'm not sure whether or not we've lost data.
>
> Best regards,
> Francisco Sobral
>
>
> On Jan 30, 2014, at 3:58 PM, Rahul Menon <ra...@apigee.com> wrote:
>
> Yes should delete all files related to <cfname>-ib-<num>-<extension>.db
>
> Run a repair after deletion
>
>
> On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral <
> fsob...@igcorp.com.br> wrote:
>
>> Ok. I'll try this idea with one sstable. But, should I delete all the
>> files associated with it? I mean, there is a difference in the number of
>> files between the BAD sstable and a GOOD one, as I've already shown:
>>
>> BAD
>> ------
>> -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
>> Sessions-Users-ib-2516-Data.db
>> -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
>> Sessions-Users-ib-2516-Index.db
>> -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
>> Sessions-Users-ib-2516-Summary.db
>>
>> GOOD
>> ---------
>> -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
>> Sessions-Users-ic-2933-CompressionInfo.db
>> -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
>> Sessions-Users-ic-2933-Data.db
>> -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
>> Sessions-Users-ic-2933-Filter.db
>> -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
>> Sessions-Users-ic-2933-Index.db
>> -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
>> Sessions-Users-ic-2933-Statistics.db
>> -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
>> Sessions-Users-ic-2933-Summary.db
>> -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
>> Sessions-Users-ic-2933-TOC.txt
>>
>> Should I delete those 3 files? Should I run nodetool refresh after the
>> operation?
>>
>> Best regards,
>> Francisco.
>>
>> On Jan 30, 2014, at 2:02 PM, Rahul Menon <ra...@apigee.com> wrote:
>>
>> > Looks like the sstables are corrupt. I dont believe there is a method
>> to recover those sstables. I would delete them and run a repair to ensure
>> data consistency.
>> >
>> > Rahul
>> >
>> >
>> > On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral <
>> fsob...@igcorp.com.br> wrote:
>> > Hi, Rahul.
>> >
>> > I've run nodetool upgradesstable only in the problematic CF. It throwed
>> the following exception:
>> >
>> > Error occurred while upgrading the sstables for keyspace Sessions
>> > java.util.concurrent.ExecutionException:
>> org.apache.cassandra.io.sstable.CorruptSSTableException:
>> java.io.IOException: dataSize of 3622081913630118729 starting at 32906
>> would be larger than file
>> /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
>> 1038
>> > 893416
>> >         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>> >         at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>> >         at
>> org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271)
>> >         at
>> org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287)
>> >         at
>> org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977)
>> >         at
>> org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191)
>> > ... ...
>> > Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
>> java.io.IOException: dataSize of 3622081913630118729 starting at 32906
>> would be larger than file
>> /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
>> 1038893416
>> >         at
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:167)
>> >         at
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:83)
>> >         at
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69)
>> >         at
>> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180)
>> >         at
>> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155)
>> >         at
>> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142)
>> >         at
>> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
>> >         at
>> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202)
>> >         at
>> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>> >         at
>> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>> >         at
>> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134)
>> >         at
>> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
>> >         at
>> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>> >         at
>> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58)
>> >         at
>> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
>> >         at
>> org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301)
>> >         at
>> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
>> >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> >         ... 3 more
>> > Caused by: java.io.IOException: dataSize of 3622081913630118729
>> starting at 32906 would be larger than file
>> /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length
>> 1038893416
>> >         at
>> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:123)
>> >         ... 20 more
>> >
>> >
>> > Regards,
>> > Francisco
>> >
>> >
>> > On Jan 29, 2014, at 3:38 PM, Rahul Menon <ra...@apigee.com> wrote:
>> >
>> > > Francisco,
>> > >
>> > > the sstables with *-ib-* is something that was from a previous
>> version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10
>> onwards im sure it has the *-ic-* convention. You could try running a
>> nodetool sstableupgrade which should ideally upgrade the sstables with the
>> *-ib-* to *-ic-*.
>> > >
>> > > Rahul
>> > >
>> > > On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral <
>> fsob...@igcorp.com.br> wrote:
>> > > Dear experts,
>> > >
>> > > We are facing a annoying problem in our cluster.
>> > >
>> > > We have 9 amazon extra large linux nodes, running Cassandra 1.2.11.
>> > >
>> > > The short story is that after moving the data from one cluster to
>> another, we've been unable to run 'nodetool repair'. It get stuck due to a
>> CorruptSSTableException in some nodes and CFs. After looking at some
>> problematic CFs, we observed that some of them have root permissions,
>> instead of cassandra permissions. Also, their names are different from the
>> 'good' ones as we can see below:
>> > >
>> > > BAD
>> > > ------
>> > > -rw-r--r-- 8 cassandra cassandra 991M Nov  8 15:11
>> Sessions-Users-ib-2516-Data.db
>> > > -rw-r--r-- 8 cassandra cassandra 703M Nov  8 15:11
>> Sessions-Users-ib-2516-Index.db
>> > > -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42
>> Sessions-Users-ib-2516-Summary.db
>> > >
>> > > GOOD
>> > > ---------
>> > > -rw-r--r-- 1 cassandra cassandra  22K Jan 15 10:50
>> Sessions-Users-ic-2933-CompressionInfo.db
>> > > -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50
>> Sessions-Users-ic-2933-Data.db
>> > > -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50
>> Sessions-Users-ic-2933-Filter.db
>> > > -rw-r--r-- 1 cassandra cassandra  76M Jan 15 10:50
>> Sessions-Users-ic-2933-Index.db
>> > > -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50
>> Sessions-Users-ic-2933-Statistics.db
>> > > -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50
>> Sessions-Users-ic-2933-Summary.db
>> > > -rw-r--r-- 1 cassandra cassandra   79 Jan 15 10:50
>> Sessions-Users-ic-2933-TOC.txt
>> > >
>> > >
>> > > We changed the permissions back to 'cassandra' and ran 'nodetool
>> scrub' in this problematic CF, but it has been running for at least two
>> weeks (it is not frozen) and keeps logging many WARNs while working with
>> the above mentioned SSTable:
>> > >
>> > > WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571
>> OutputHandler.java (line 57) Non-fatal error reading row (stacktrace
>> follows)
>> > > java.io.IOError: java.io.IOException: Impossible row size
>> 3618452438597849419
>> > >         at
>> org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171)
>> > >         at
>> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526)
>> > >         at
>> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515)
>> > >         at
>> org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70)
>> > >         at
>> org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280)
>> > >         at
>> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250)
>> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>> > >         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> > >         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> > >         at java.lang.Thread.run(Thread.java:744)
>> > > Caused by: java.io.IOException: Impossible row size
>> 3618452438597849419
>> > >         ... 10 more
>> > >
>> > >
>> > > 1) I do not think that deleting all data of one node and running
>> 'nodetool rebuild' will work, since we observed that this problem occurs in
>> all nodes. So we may not be able to restore all the data. What can be done
>> in this case?
>> > >
>> > > 2) Why the permissions of some sstables are 'root'? Is this problem
>> caused by our manual migration of data? (see long story below)
>> > >
>> > >
>> > > How we ran into this?
>> > >
>> > > The long story is that we've tried to move our cluster with
>> sstableloader, but it was unable to load all the data correctly. Our
>> solution was to put ALL cluster data into EACH new node and run 'nodetool
>> refresh'. I performed this task for each node and each column family
>> sequentially. Sometimes I had to rename some sstables, because they came
>> from different nodes with the same name. I don't remember if I ran
>> 'nodetool repair'  or even 'nodetool cleanup' in each node. Apparently, the
>> process was successful, and (almost) all the data was moved.
>> > >
>> > > Unfortunately, after 3 months since we moved, I am unable to perform
>> read operations in some keys of some CFs. I think that some of these keys
>> belong to the above mentioned sstables.
>> > >
>> > > Any insights are welcome.
>> > >
>> > > Best regards,
>> > > Francisco Sobral
>> > >
>> >
>> >
>>
>>
>
>

Re: Possibly losing data with corrupted SSTables

Reply via email to