Some more details from logs per IRC discussions (I've edited down to just
entries for the one CF, several are corrupt).
Older references to troublesome CF
INFO [main] 2009-09-15 18:32:14,784 RecoveryManager.java (line 57) Replaying
/var/lib/cassandra/commitlog/CommitLog-1252520035020.log,
/var/lib/cassandra/commitlog/CommitLog-1252521410192.log,
/var/lib/cassandra/commitlog/CommitLog-1252522551113.log
INFO [main] 2009-09-15 18:32:21,684 Memtable.java (line 186) Flushing
Memtable(at)@218311925
INFO [main] 2009-09-15 18:32:22,585 Memtable.java (line 220) Completed
flushing Memtable(at)@218311925
INFO [main] 2009-09-15 18:32:30,467 Memtable.java (line 186) Flushing
Memtable(at)@218311925
INFO [main] 2009-09-15 18:32:31,007 Memtable.java (line 220) Completed
flushing Memtable(at)@218311925
INFO [main] 2009-09-15 18:32:42,815 Memtable.java (line 186) Flushing
Memtable(at)@218311925
INFO [main] 2009-09-15 18:32:43,613 Memtable.java (line 220) Completed
flushing Memtable(at)@218311925
INFO [main] 2009-09-15 18:36:24,866 RecoveryManager.java (line 57) Replaying
/var/lib/cassandra/commitlog/CommitLog-1253039564240.log
INFO [main] 2009-09-15 18:36:24,926 Memtable.java (line 186) Flushing
Memtable(at)@2118052890
INFO [MINOR-COMPACTION-POOL:1] 2009-09-15 18:36:25,111 ColumnFamilyStore.java
(line 1013) Compacting
[/var/lib/cassandra/data/xt/at-1-Data.db,/var/lib/cassandra/data/xt/at-2-Data.db,/var/lib/cassandra/data/xt/at-3-Data.db,/var/lib/cassandra/data/xt/at-4-Data.db]
INFO [main] 2009-09-15 18:36:25,291 Memtable.java (line 220) Completed
flushing Memtable(at)@2118052890
INFO [main] 2009-09-15 18:36:24,866 RecoveryManager.java (line 57) Replaying
/var/lib/cassandra/commitlog/CommitLog-1253039564240.log
INFO [main] 2009-09-15 18:36:24,926 Memtable.java (line 186) Flushing
Memtable(at)@2118052890
INFO [MINOR-COMPACTION-POOL:1] 2009-09-15 18:36:25,111 ColumnFamilyStore.java
(line 1013) Compacting
[/var/lib/cassandra/data/xt/at-1-Data.db,/var/lib/cassandra/data/xt/at-2-Data.db,/var/lib/cassandra/data/xt/at-3-Data.db,/var/lib/cassandra/data/xt/at-4-Data.db]
INFO [main] 2009-09-15 18:36:25,291 Memtable.java (line 220) Completed
flushing Memtable(at)@2118052890
INFO [main] 2009-09-15 18:36:25,526 SystemTable.java (line 118) Saved Token
found: 21467257700115401363439435209518481790
INFO [main] 2009-09-29 22:59:37,886 Memtable.java (line 186) Flushing
Memtable(at)@725519388
INFO [main] 2009-09-29 22:59:38,454 Memtable.java (line 220) Completed
flushing Memtable(at)@725519388
INFO [main] 2009-09-29 22:59:40,651 Memtable.java (line 186) Flushing
Memtable(at)@725519388
INFO [main] 2009-09-29 22:59:40,950 Memtable.java (line 220) Completed
flushing Memtable(at)@725519388
INFO [main] 2009-09-29 22:59:53,669 Memtable.java (line 186) Flushing
Memtable(at)@725519388
INFO [main] 2009-09-29 22:59:53,969 Memtable.java (line 220) Completed
flushing Memtable(at)@725519388
Restart that caused the problems here
INFO [main] 2009-09-29 22:59:55,641 CassandraDaemon.java (line 142) Cassandra
starting up...
INFO [MINOR-COMPACTION-POOL:1] 2009-09-29 22:59:59,678 ColumnFamilyStore.java
(line 1013) Compacting
[/var/lib/cassandra/data/xt/at-5-Data.db,/var/lib/cassandra/data/xt/at-6-Data.db,/var/lib/cassandra/data/xt/at-7-Data.db,/var/lib/cassandra/data/xt/at-8-Data.db]
INFO [MINOR-COMPACTION-POOL:1] 2009-09-29 23:00:00,840 ColumnFamilyStore.java
(line 1155) Compacted to /var/lib/cassandra/data/xt/at-9-Data.db. 0/476199
bytes for 3010/2221 keys read/written. Time: 1162ms.
ERROR [pool-1-thread-124] 2009-09-29 23:00:49,888 Cassandra.java (line 713)
Internal error processing multiget_slice java.lang.RuntimeException:
java.io.EOFException
at
org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:103)
at
org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:180)
at
org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:249)
at
org.apache.cassandra.service.CassandraServer.multiget_slice(CassandraServer.java:226)
at
org.apache.cassandra.service.Cassandra$Processor$multiget_slice.process(Cassandra.java:709)
at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.ja
va:609)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
at
org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95)
at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118)
at
org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56)
at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325)
at org.apache.cassandra.db.Table.getRow(Table.java:590)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
at org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.
java:518)
at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.j
ava:310)
at
org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:99)
... 9 more
INFO [main] 2009-09-29 23:29:26,476 Memtable.java (line 186) Flushing
Memtable(at)@688788755
INFO [main] 2009-09-29 23:29:26,800 Memtable.java (line 220) Completed
flushing Memtable(at)@688788755
INFO [main] 2009-09-30 17:43:00,338 Memtable.java (line 186) Flushing
Memtable(at)@1903646637
INFO [main] 2009-09-30 17:43:03,066 Memtable.java (line 220) Completed
flushing Memtable(at)@1903646637
Turned on DEBUG logging
DEBUG [main] 2009-09-30 17:50:16,408 Table.java (line 101) adding at as 13
DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD
TIME for /var/lib/cassandra/data/xt/at-9-Data.db: 0 ms.
DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD
TIME for /var/lib/cassandra/data/xt/at-10-Data.db: 0 ms.
DEBUG [main] 2009-09-30 17:50:17,588 SSTableReader.java (line 100) INDEX LOAD
TIME for /var/lib/cassandra/data/xt/at-11-Data.db: 0 ms.
ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:17,588
DebuggableThreadPoolExecutor.java (line 103) Error in executor futuretask
java.util.concurrent.ExecutionException: java.io.EOFException
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutureExceptions(DebuggableThreadPoolExecutor.java:95)
at
org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor.afterExecute(DebuggableScheduledThreadPoolExecutor.java:50)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77)
at
org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46)
at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47)
at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124)
at
org.apache.cassandra.db.ColumnFamilyStore.initializePriorityQueue(ColumnFamilyStore.java:623)
at
org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFamilyStore.java:1030)
at
org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689)
at
org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165)
at
org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
... 2 more
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading
mutation at 5151
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying
mutation for xt.992: {ColumnFamily(at [score,])}
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading
mutation at 5291
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying
mutation for xt.992: {ColumnFamily(at [score,])}
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading
mutation at 5291
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 322) replaying
mutation for xt.992: {ColumnFamily(at [score,])}
DEBUG [main] 2009-09-30 17:50:17,648 CommitLog.java (line 301) Reading
mutation at 5291
INFO [main] 2009-09-30 17:50:18,188 Memtable.java (line 186) Flushing
Memtable(at)@1881395339
DEBUG [main] 2009-09-30 17:50:18,388 ColumnFamilyStore.java (line 609)
Submitting at for compaction
INFO [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 ColumnFamilyStore.java
(line 1013) Compacting
[/var/lib/cassandra/data/xt/at-9-Data.db,/var/lib/cassandra/data/xt/at-10-Data.db,/var/lib/cassandra/data/xt/at-11-Data.db,/var/lib/cassandra/data/xt/at-12-Data.db]
DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java
(line 58) index size for bloom filter calc for file :
/var/lib/cassandra/data/xt/at-9-Data.db : 2432
DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java
(line 58) index size for bloom filter calc for file :
/var/lib/cassandra/data/xt/at-10-Data.db : 2688
DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java
(line 58) index size for bloom filter calc for file :
/var/lib/cassandra/data/xt/at-11-Data.db : 5120
DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 SSTableReader.java
(line 58) index size for bloom filter calc for file :
/var/lib/cassandra/data/xt/at-12-Data.db : 5376
DEBUG [MINOR-COMPACTION-POOL:1] 2009-09-30 17:50:18,388 ColumnFamilyStore.java
(line 1048) Expected bloom filter size : 5376
INFO [main] 2009-09-30 17:50:18,447 Memtable.java (line 220) Completed
flushing Memtable(at)@1881395339
DEBUG [pool-1-thread-1] 2009-09-30 17:50:30,037 StorageProxy.java (line 468)
weakreadlocal reading SliceFromReadCommand(table='xt', key='992',
column_parent='QueryPath(columnFamilyName='at', superColumnName='null',
columnName='null')', start='', finish='', reversed=false, count=100)
ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679)
Internal error processing get_slice
java.lang.RuntimeException: java.io.EOFException
at
org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104)
at
org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
at
org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
at
org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
at
org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
at
org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
at
org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95)
at
org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118)
at
org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56)
at
org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390)
at
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325)
at org.apache.cassandra.db.Table.getRow(Table.java:590)
at
org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
at
org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471)
at
org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309)
at
org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
... 9 more
Hope that helps narrow down the issue.
-Anthony
On Wed, Sep 30, 2009 at 11:46:47AM -0700, Anthony Molinaro wrote:
> More data, I decided to force a flush on the table I know was corrupt and
> got this error
>
> ERROR [MINOR-COMPACTION-POOL:1] 2009-09-30 18:42:01,010
> DebuggableThreadPoolExec
> utor.java (line 103) Error in executor
> futuretaskjava.util.concurrent.ExecutionException: java.io.EOFException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.logFutur
> eExceptions(DebuggableThreadPoolExecutor.java:95)
> at
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor
> .afterExecute(DebuggableScheduledThreadPoolExecutor.java:50)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
> .java:908)
> at java.lang.Thread.run(Thread.java:619)Caused by:
> java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at org.apache.cassandra.io.IndexHelper.skipIndex(IndexHelper.java:77)
> at
> org.apache.cassandra.io.IndexHelper.skipBloomFilterAndIndex(IndexHelper.java:46)
> at org.apache.cassandra.io.IteratingRow.<init>(IteratingRow.java:47)
> at org.apache.cassandra.io.FileStruct.advance(FileStruct.java:124)
> at
> org.apache.cassandra.db.ColumnFamilyStore.doFileCompaction(ColumnFami
> lyStore.java:1108)
> at
> org.apache.cassandra.db.ColumnFamilyStore.doCompaction(ColumnFamilyStore.java:689)
> at
> org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:165)
> at
> org.apache.cassandra.db.MinorCompactionManager$1.call(MinorCompactionManager.java:162)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> ... 2 more
>
> -Anthony
>
> On Wed, Sep 30, 2009 at 11:24:15AM -0700, Anthony Molinaro wrote:
> > Hi,
> >
> > I'm not getting any responses on IRC, so figured I'd put this out on
> > the mailing list.
> >
> > I had a 3 node cassandra cluster, replication factor 3 on
> > 3 ec2 m1.large instances behind an haproxy. I restarted one
> > of the node to test out some modified sysctl's (tcp stack tuning).
> > As soon as I restarted it the other 2 nodes started spiking memory
> > use and the first node seemed to have corrupted data. The corruption
> > is an exception when I read some and only some keys.
> >
> > The exception is
> >
> > ERROR [pool-1-thread-1] 2009-09-30 17:50:30,037 Cassandra.java (line 679)
> > Internal error processing get_slice
> > java.lang.RuntimeException: java.io.EOFException
> > at
> > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:104)
> > at
> > org.apache.cassandra.service.CassandraServer.getSlice(CassandraServer.java:182)
> > at
> > org.apache.cassandra.service.CassandraServer.multigetSliceInternal(CassandraServer.java:251)
> > at
> > org.apache.cassandra.service.CassandraServer.get_slice(CassandraServer.java:220)
> > at
> > org.apache.cassandra.service.Cassandra$Processor$get_slice.process(Cassandra.java:671)
> > at
> > org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
> >
> > at
> > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:619)
> > Caused by: java.io.EOFException
> > at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> > at
> > org.apache.cassandra.io.IndexHelper.deserializeIndex(IndexHelper.java:95)
> > at
> > org.apache.cassandra.db.filter.SSTableSliceIterator$ColumnGroupReader.<init>(SSTableSliceIterator.java:118)
> > at
> > org.apache.cassandra.db.filter.SSTableSliceIterator.<init>(SSTableSliceIterator.java:56)
> > at
> > org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:64)
> > at
> > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1390)
> > at
> > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1325)
> > at org.apache.cassandra.db.Table.getRow(Table.java:590)
> > at
> > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:59)
> > at
> > org.apache.cassandra.service.StorageProxy.weakReadLocal(StorageProxy.java:471)
> > at
> > org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:309)
> > at
> > org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
> > ... 9 more
> >
> >
> > I ended up having to fire up some new instances, and reload the data
> > (luckily this is my small instance which I can reload quickly, I've got a
> > large cassandra cluster currently loading which I will not be
> > able to do this with, so I'm a little scared about that cluster).
> >
> > Anyway, any ideas? I've left the broken cluster so I can
> > investigate/patch/etc.
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro <[email protected]>
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro <[email protected]>
--
------------------------------------------------------------------------
Anthony Molinaro <[email protected]>