You might want to look at this JIRA i filed today CASSANDRA-6696 <https://issues.apache.org/jira/browse/CASSANDRA-6696>
You are good if you are fine with data reappearing. On Wed, Feb 12, 2014 at 9:20 AM, Francisco Nogueira Calmon Sobral < fsob...@igcorp.com.br> wrote: > Hi, Rahul. > > I've removed the corrupted sstables and 'nodetool repair' ran successfully > for the column family. I'm not sure whether or not we've lost data. > > Best regards, > Francisco Sobral > > > On Jan 30, 2014, at 3:58 PM, Rahul Menon <ra...@apigee.com> wrote: > > Yes should delete all files related to <cfname>-ib-<num>-<extension>.db > > Run a repair after deletion > > > On Thu, Jan 30, 2014 at 10:17 PM, Francisco Nogueira Calmon Sobral < > fsob...@igcorp.com.br> wrote: > >> Ok. I'll try this idea with one sstable. But, should I delete all the >> files associated with it? I mean, there is a difference in the number of >> files between the BAD sstable and a GOOD one, as I've already shown: >> >> BAD >> ------ >> -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 >> Sessions-Users-ib-2516-Data.db >> -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 >> Sessions-Users-ib-2516-Index.db >> -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 >> Sessions-Users-ib-2516-Summary.db >> >> GOOD >> --------- >> -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 >> Sessions-Users-ic-2933-CompressionInfo.db >> -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 >> Sessions-Users-ic-2933-Data.db >> -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 >> Sessions-Users-ic-2933-Filter.db >> -rw-r--r-- 1 cassandra cassandra 76M Jan 15 10:50 >> Sessions-Users-ic-2933-Index.db >> -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 >> Sessions-Users-ic-2933-Statistics.db >> -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 >> Sessions-Users-ic-2933-Summary.db >> -rw-r--r-- 1 cassandra cassandra 79 Jan 15 10:50 >> Sessions-Users-ic-2933-TOC.txt >> >> Should I delete those 3 files? Should I run nodetool refresh after the >> operation? >> >> Best regards, >> Francisco. >> >> On Jan 30, 2014, at 2:02 PM, Rahul Menon <ra...@apigee.com> wrote: >> >> > Looks like the sstables are corrupt. I dont believe there is a method >> to recover those sstables. I would delete them and run a repair to ensure >> data consistency. >> > >> > Rahul >> > >> > >> > On Wed, Jan 29, 2014 at 11:29 PM, Francisco Nogueira Calmon Sobral < >> fsob...@igcorp.com.br> wrote: >> > Hi, Rahul. >> > >> > I've run nodetool upgradesstable only in the problematic CF. It throwed >> the following exception: >> > >> > Error occurred while upgrading the sstables for keyspace Sessions >> > java.util.concurrent.ExecutionException: >> org.apache.cassandra.io.sstable.CorruptSSTableException: >> java.io.IOException: dataSize of 3622081913630118729 starting at 32906 >> would be larger than file >> /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length >> 1038 >> > 893416 >> > at java.util.concurrent.FutureTask.report(FutureTask.java:122) >> > at java.util.concurrent.FutureTask.get(FutureTask.java:188) >> > at >> org.apache.cassandra.db.compaction.CompactionManager.performAllSSTableOperation(CompactionManager.java:271) >> > at >> org.apache.cassandra.db.compaction.CompactionManager.performSSTableRewrite(CompactionManager.java:287) >> > at >> org.apache.cassandra.db.ColumnFamilyStore.sstablesRewrite(ColumnFamilyStore.java:977) >> > at >> org.apache.cassandra.service.StorageService.upgradeSSTables(StorageService.java:2191) >> > ... ... >> > Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: >> java.io.IOException: dataSize of 3622081913630118729 starting at 32906 >> would be larger than file >> /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length >> 1038893416 >> > at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:167) >> > at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:83) >> > at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69) >> > at >> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:180) >> > at >> org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:155) >> > at >> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:142) >> > at >> org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38) >> > at >> org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:202) >> > at >> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) >> > at >> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) >> > at >> org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:134) >> > at >> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) >> > at >> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) >> > at >> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:58) >> > at >> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) >> > at >> org.apache.cassandra.db.compaction.CompactionManager$4.perform(CompactionManager.java:301) >> > at >> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) >> > at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> > ... 3 more >> > Caused by: java.io.IOException: dataSize of 3622081913630118729 >> starting at 32906 would be larger than file >> /mnt/cassandra/data/Sessions/Users/Sessions-Users-ib-2516-Data.db length >> 1038893416 >> > at >> org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:123) >> > ... 20 more >> > >> > >> > Regards, >> > Francisco >> > >> > >> > On Jan 29, 2014, at 3:38 PM, Rahul Menon <ra...@apigee.com> wrote: >> > >> > > Francisco, >> > > >> > > the sstables with *-ib-* is something that was from a previous >> version of c*. The *-ib-* naming convention started at c* 1.2.1 but 1.2.10 >> onwards im sure it has the *-ic-* convention. You could try running a >> nodetool sstableupgrade which should ideally upgrade the sstables with the >> *-ib-* to *-ic-*. >> > > >> > > Rahul >> > > >> > > On Wed, Jan 29, 2014 at 12:55 AM, Francisco Nogueira Calmon Sobral < >> fsob...@igcorp.com.br> wrote: >> > > Dear experts, >> > > >> > > We are facing a annoying problem in our cluster. >> > > >> > > We have 9 amazon extra large linux nodes, running Cassandra 1.2.11. >> > > >> > > The short story is that after moving the data from one cluster to >> another, we've been unable to run 'nodetool repair'. It get stuck due to a >> CorruptSSTableException in some nodes and CFs. After looking at some >> problematic CFs, we observed that some of them have root permissions, >> instead of cassandra permissions. Also, their names are different from the >> 'good' ones as we can see below: >> > > >> > > BAD >> > > ------ >> > > -rw-r--r-- 8 cassandra cassandra 991M Nov 8 15:11 >> Sessions-Users-ib-2516-Data.db >> > > -rw-r--r-- 8 cassandra cassandra 703M Nov 8 15:11 >> Sessions-Users-ib-2516-Index.db >> > > -rw-r--r-- 8 cassandra cassandra 5.3M Nov 13 11:42 >> Sessions-Users-ib-2516-Summary.db >> > > >> > > GOOD >> > > --------- >> > > -rw-r--r-- 1 cassandra cassandra 22K Jan 15 10:50 >> Sessions-Users-ic-2933-CompressionInfo.db >> > > -rw-r--r-- 1 cassandra cassandra 106M Jan 15 10:50 >> Sessions-Users-ic-2933-Data.db >> > > -rw-r--r-- 1 cassandra cassandra 2.2M Jan 15 10:50 >> Sessions-Users-ic-2933-Filter.db >> > > -rw-r--r-- 1 cassandra cassandra 76M Jan 15 10:50 >> Sessions-Users-ic-2933-Index.db >> > > -rw-r--r-- 1 cassandra cassandra 4.3K Jan 15 10:50 >> Sessions-Users-ic-2933-Statistics.db >> > > -rw-r--r-- 1 cassandra cassandra 574K Jan 15 10:50 >> Sessions-Users-ic-2933-Summary.db >> > > -rw-r--r-- 1 cassandra cassandra 79 Jan 15 10:50 >> Sessions-Users-ic-2933-TOC.txt >> > > >> > > >> > > We changed the permissions back to 'cassandra' and ran 'nodetool >> scrub' in this problematic CF, but it has been running for at least two >> weeks (it is not frozen) and keeps logging many WARNs while working with >> the above mentioned SSTable: >> > > >> > > WARN [CompactionExecutor:15] 2014-01-28 17:01:22,571 >> OutputHandler.java (line 57) Non-fatal error reading row (stacktrace >> follows) >> > > java.io.IOError: java.io.IOException: Impossible row size >> 3618452438597849419 >> > > at >> org.apache.cassandra.db.compaction.Scrubber.scrub(Scrubber.java:171) >> > > at >> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:526) >> > > at >> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:515) >> > > at >> org.apache.cassandra.db.compaction.CompactionManager.access$400(CompactionManager.java:70) >> > > at >> org.apache.cassandra.db.compaction.CompactionManager$3.perform(CompactionManager.java:280) >> > > at >> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:250) >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:262) >> > > at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> > > at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> > > at java.lang.Thread.run(Thread.java:744) >> > > Caused by: java.io.IOException: Impossible row size >> 3618452438597849419 >> > > ... 10 more >> > > >> > > >> > > 1) I do not think that deleting all data of one node and running >> 'nodetool rebuild' will work, since we observed that this problem occurs in >> all nodes. So we may not be able to restore all the data. What can be done >> in this case? >> > > >> > > 2) Why the permissions of some sstables are 'root'? Is this problem >> caused by our manual migration of data? (see long story below) >> > > >> > > >> > > How we ran into this? >> > > >> > > The long story is that we've tried to move our cluster with >> sstableloader, but it was unable to load all the data correctly. Our >> solution was to put ALL cluster data into EACH new node and run 'nodetool >> refresh'. I performed this task for each node and each column family >> sequentially. Sometimes I had to rename some sstables, because they came >> from different nodes with the same name. I don't remember if I ran >> 'nodetool repair' or even 'nodetool cleanup' in each node. Apparently, the >> process was successful, and (almost) all the data was moved. >> > > >> > > Unfortunately, after 3 months since we moved, I am unable to perform >> read operations in some keys of some CFs. I think that some of these keys >> belong to the above mentioned sstables. >> > > >> > > Any insights are welcome. >> > > >> > > Best regards, >> > > Francisco Sobral >> > > >> > >> > >> >> > >