Some more details from logs per IRC discussions (I've edited down to just entries for the one CF, several are corrupt).
Older references to troublesome CF INFO [main] 2009-09-15 18:32:14,784 RecoveryManager.java (line 57) Replaying /var/lib/cassandra/commitlog/CommitLog-1252520035020.log, /var/lib/cassandra/commitlog/CommitLog-1252521410192.log, /var/lib/cassandra/commitlog/CommitLog-1252522551113.log INFO [main] 2009-09-15 18:32:21,684 Memtable.java (line 186) Flushing Memtable(at)@218311925 INFO [main] 2009-09-15 18:32:22,585 Memtable.java (line 220) Completed flushing Memtable(at)@218311925 INFO [main] 2009-09-15 18:32:30,467 Memtable.java (line 186) Flushing Memtable(at)@218311925 INFO [main] 2009-09-15 18:32:31,007 Memtable.java (line 220) Completed flushing Memtable(at)@218311925 INFO [main] 2009-09-15 18:32:42,815 Memtable.java (line 186) Flushing Memtable(at)@218311925 INFO [main] 2009-09-15 18:32:43,613 Memtable.java (line 220) Completed flushing Memtable(at)@218311925 INFO [main] 2009-09-15 18:36:24,866 RecoveryManager.java (line 57) Replaying /var/lib/cassandra/commitlog/CommitLog-1253039564240.log INFO [main] 2009-09-15 18:36:24,926 Memtable.java (line 186) Flushing Memtable(at)@2118052890 INFO [MINOR-COMPACTION-POOL:1] 2009-09-15 18:36:25,111 ColumnFamilyStore.java (line 1013) Compacting [/var/lib/cassandra/data/xt/at-1-Data.db,/var/lib/cassandra/data/xt/at-2-Data.db,/var/lib/cassandra/data/xt/at-3-Data.db,/var/lib/cassandra/data/xt/at-4-Data.db] INFO [main] 2009-09-15 18:36:25,291 Memtable.java (line 220) Completed flushing Memtable(at)@2118052890 INFO [main] 2009-09-15 18:36:24,866 RecoveryManager.java (line 57) Replaying /var/lib/cassandra/commitlog/CommitLog-1253039564240.log INFO [main] 2009-09-15 18:36:24,926 Memtable.java (line 186) Flushing Memtable(at)@2118052890 INFO [MINOR-COMPACTION-POOL:1] 2009-09-15 18:36:25,111 ColumnFamilyStore.java (line 1013) Compacting [/var/lib/cassandra/data/xt/at-1-Data.db,/var/lib/cassandra/data/xt/at-2-Data.db,/var/lib/cassandra/data/xt/at-3-Data.db,/var/lib/cassandra/data/xt/at-4-Data.db] INFO [main] 2009-09-15 18:36:25,291 Memtable.java (line 220) Completed flushing Memtable(at)@2118052890 INFO [main] 2009-09-15 18:36:25,526 SystemTable.java (line 118) Saved Token found: 21467257700115401363439435209518481790 INFO [main] 2009-09-29 22:59:37,886 Memtable.java (line 186) Flushing Memtable(at)@725519388 INFO [main] 2009-09-29 22:59:38,454 Memtable.java (line 220) Completed flushing Memtable(at)@725519388 INFO [main] 2009-09-29 22:59:40,651 Memtable.java (line 186) Flushing Memtable(at)@725519388 INFO [main] 2009-09-29 22:59:40,950 Memtable.java (line 220) Completed flushing Memtable(at)@725519388 INFO [main] 2009-09-29 22:59:53,669 Memtable.java (line 186) Flushing Memtable(at)@725519388 INFO [main] 2009-09-29 22:59:53,969 Memtable.java (line 220) Completed flushing Memtable(at)@725519388 Restart that caused the problems here INFO [main] 2009-09-29 22:59:55,641 CassandraDaemon.java (line 142) Cassandra starting up... INFO [MINOR-COMPACTION-POOL:1] 2009-09-29 22:59:59,678 ColumnFamilyStore.java (line 1013) Compacting [/var/lib/cassandra/data/xt/at-5-Data.db,/var/lib/cassandra/data/xt/at-6-Data.db,/var/lib/cassandra/data/xt/at-7-Data.db,/var/lib/cassandra/data/xt/at-8-Data.db] INFO [MINOR-COMPACTION-POOL:1] 2009-09-29 23:00:00,840 ColumnFamilyStore.java (line 1155) Compacted to /var/lib/cassandra/data/xt/at-9-Data.db. 0/476199 bytes for 3010/2221 keys read/written. Time: 1162ms. ERROR [pool-1-thread-124] 2009-09-29 23:00:49,888 Cassandra.java (line 713) Internal error processing multiget_slice java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:103) at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:180) at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:249) at org.apache.cassandra.service.CassandraServer.multiget_slice(CassandraServer.java:226) at org.apache.cassandra.service.Cassandra$Processor$multiget_slice.process(Cassandra.java:709) at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.ja va:609) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.EOFException at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) at org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95) at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118) at org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325) at org.apache.cassandra.db.Table.getRow(Table.java:590) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy. java:518) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.j ava:310) at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:99) ... 9 more INFO [main] 2009-09-29 23:29:26,476 Memtable.java (line 186) Flushing Memtable(at)@688788755 INFO [main] 2009-09-29 23:29:26,800 Memtable.java (line 220) Completed flushing Memtable(at)@688788755 INFO [main] 2009-09-30 17:43:00,338 Memtable.java (line 186) Flushing Memtable(at)@1903646637 INFO [main] 2009-09-30 17:43:03,066 Memtable.java (line 220) Completed flushing Memtable(at)@1903646637 Turned on DEBUG logging DEBUG [main] 2009-09-30 17:50:16,408 Table.java (line 101) adding at as 13 DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD TIME for /var/lib/cassandra/data/xt/at-9-Data.db: 0 ms. DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD TIME for /var/lib/cassandra/data/xt/at-10-Data.db: 0 ms. DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD TIME for /var/lib/cassandra/data/xt/at-11-Data.db: 0 ms. ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:17,588 DebuggableThreadPoolExecutor.java (line 103) Error in executor futuretask java.util.concurrent.ExecutionException: java.io.EOFException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutureExceptions(DebuggableThreadPoolExecutor.java:95) at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:50) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.EOFException at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77) at org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46) at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47) at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124) at org.apache.cassandra.db.ColumnFamilyStore.initializePriorityQueue(ColumnFamilyStore.java:623) at org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:1030) at org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689) at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165) at org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5151 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying mutation for xt.992: {ColumnFamily(at [score,])} DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5291 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying mutation for xt.992: {ColumnFamily(at [score,])} DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5291 DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying mutation for xt.992: {ColumnFamily(at [score,])} DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading mutation at 5291 INFO [main] 2009-09-30 17:50:18,188 Memtable.java (line 186) Flushing Memtable(at)@1881395339 DEBUG [main] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 609) Submitting at for compaction INFO [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 1013) Compacting [/var/lib/cassandra/data/xt/at-9-Data.db,/var/lib/cassandra/data/xt/at-10-Data.db,/var/lib/cassandra/data/xt/at-11-Data.db,/var/lib/cassandra/data/xt/at-12-Data.db] DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index size for bloom filter calc for file : /var/lib/cassandra/data/xt/at-9-Data.db : 2432 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index size for bloom filter calc for file : /var/lib/cassandra/data/xt/at-10-Data.db : 2688 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index size for bloom filter calc for file : /var/lib/cassandra/data/xt/at-11-Data.db : 5120 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java (line 58) index size for bloom filter calc for file : /var/lib/cassandra/data/xt/at-12-Data.db : 5376 DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 1048) Expected bloom filter size : 5376 INFO [main] 2009-09-30 17:50:18,447 Memtable.java (line 220) Completed flushing Memtable(at)@1881395339 DEBUG [pool-1-thread-1] 2009-09-30 17:50:30,037 StorageProxy.java (line 468) weakreadlocal reading SliceFromReadCommand(table='xt', key='992', column_parent='QueryPath(columnFamilyName='at', superColumnName='null', columnName='null')', start='', finish='', reversed=false, count=100) ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679) Internal error processing get_slice java.lang.RuntimeException: java.io.EOFException at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104) at org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182) at org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251) at org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220) at org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671) at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.io.EOFException at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) at org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95) at org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118) at org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56) at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325) at org.apache.cassandra.db.Table.getRow(Table.java:590) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471) at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309) at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100) ... 9 more Hope that helps narrow down the issue. -Anthony On Wed, Sep 30, 2009 at 11:46:47AM -0700, Anthony Molinaro wrote: > More data, I decided to force a flush on the table I know was corrupt and > got this error > > ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 18:42:01,010 > DebuggableThreadPoolExec > utor.java (line 103) Error in executor > futuretaskjava.util.concurrent.ExecutionException: java.io.EOFException > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutur > eExceptions(DebuggableThreadPoolExecutor.java:95) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor > .afterExecute(DebuggableScheduledThreadPoolExecutor.java:50) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor > .java:908) > at java.lang.Thread.run(Thread.java:619)Caused by: > java.io.EOFException > at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) > at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77) > at > org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46) > at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47) > at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124) > at > org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFami > lyStore.java:1108) > at > org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689) > at > org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165) > at > org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > ... 2 more > > -Anthony > > On Wed, Sep 30, 2009 at 11:24:15AM -0700, Anthony Molinaro wrote: > > Hi, > > > > I'm not getting any responses on IRC, so figured I'd put this out on > > the mailing list. > > > > I had a 3 node cassandra cluster, replication factor 3 on > > 3 ec2 m1.large instances behind an haproxy. I restarted one > > of the node to test out some modified sysctl's (tcp stack tuning). > > As soon as I restarted it the other 2 nodes started spiking memory > > use and the first node seemed to have corrupted data. The corruption > > is an exception when I read some and only some keys. > > > > The exception is > > > > ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679) > > Internal error processing get_slice > > java.lang.RuntimeException: java.io.EOFException > > at > > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104) > > at > > org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182) > > at > > org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251) > > at > > org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220) > > at > > org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671) > > at > > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627) > > > > at > > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > Caused by: java.io.EOFException > > at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725) > > at > > org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95) > > at > > org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118) > > at > > org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56) > > at > > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64) > > at > > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390) > > at > > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325) > > at org.apache.cassandra.db.Table.getRow(Table.java:590) > > at > > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59) > > at > > org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471) > > at > > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309) > > at > > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100) > > ... 9 more > > > > > > I ended up having to fire up some new instances, and reload the data > > (luckily this is my small instance which I can reload quickly, I've got a > > large cassandra cluster currently loading which I will not be > > able to do this with, so I'm a little scared about that cluster). > > > > Anyway, any ideas? I've left the broken cluster so I can > > investigate/patch/etc. > > > > -Anthony > > > > -- > > ------------------------------------------------------------------------ > > Anthony Molinaro <antho...@alumni.caltech.edu> > > -- > ------------------------------------------------------------------------ > Anthony Molinaro <antho...@alumni.caltech.edu> -- ------------------------------------------------------------------------ Anthony Molinaro <antho...@alumni.caltech.edu>