[
https://issues.apache.org/jira/browse/CASSANDRA-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13048655#comment-13048655
]
Terje Marthinussen commented on CASSANDRA-2752:
-----------------------------------------------
SSTableWriter.java:
ColumnFamily.serializer().deserializeFromSSTableNoColumns(ColumnFamily.create(cfs.metadata),
dfile);
// don't move that statement around, it expects the dfile to be
before the columns
updateCache(key, dataSize, null);
rowSizes.add(dataSize);
columnCounts.add(dfile.readInt());
I believe the problem is in updateCache.
If rowcache is enabled (and it is in this case) and the row needs to be updated
in cache, this will read (deserialize) the row.
However, after all the columns is read, the offset in the file is not reset
back to the location where the column count is stored and things go bad.
I haven't actually tried to change the code to test, but I tried to disable the
row cache, and so far, repair seems to work fine when it is disabled.
> repair fails with java.io.EOFException
> --------------------------------------
>
> Key: CASSANDRA-2752
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2752
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.8.0
> Reporter: Terje Marthinussen
>
> Issuing repair on node 1 (1.10.42.81) in a cluster quickly fails with
> INFO [AntiEntropyStage:1] 2011-06-09 19:02:47,999 AntiEntropyService.java
> (line 234) Queueing comparison #<Differencer #<TreeRequest
> manual-repair-0c17c5f9-583f-4a31-a6d4-a9e7306fb46e, /1
> .10.42.82, (JP,XXX), (Token(bytes[6e]),Token(bytes[313039])]>>
> INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,026 AntiEntropyService.java
> (line 468) Endpoints somewhere/1.10.42.81 and /1.10.42.82 have 2 range(s) out
> of sync for (JP,XXX) on (Token(bytes[6e]),Token(bytes[313039])]
> INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,026 AntiEntropyService.java
> (line 485) Performing streaming repair of 2 ranges for #<TreeRequest
> manual-repair-0c17c5f9-583f-4a31-a6d4-a9e7306
> fb46e, /1.10.42.82, (JP,XXX), (Token(bytes[6e]),Token(bytes[313039])]>
> INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,030 StreamOut.java (line 173)
> Stream context metadata [/data/cassandra/node0/data/JP/XXX-g-3-Data.db
> sections=1 progress=0/36592 - 0%], 1 sstables.
> INFO [AntiEntropyStage:1] 2011-06-09 19:02:48,031 StreamOutSession.java
> (line 174) Streaming to /1.10.42.82
> ERROR [CompactionExecutor:9] 2011-06-09 19:02:48,970
> AbstractCassandraDaemon.java (line 113) Fatal exception in thread
> Thread[CompactionExecutor:9,1,main]
> java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
> at
> org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
> at
> org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> On .82
> ERROR [CompactionExecutor:12] 2011-06-09 19:02:48,051
> AbstractCassandraDaemon.java (line 113) Fatal exception in thread
> Thread[CompactionExecutor:12,1,main]
> java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
> at
> org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
> at
> org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> ERROR [Thread-132] 2011-06-09 19:02:48,051 AbstractCassandraDaemon.java (line
> 113) Fatal exception in thread Thread[Thread-132,5,main]
> java.lang.RuntimeException: java.util.concurrent.ExecutionException:
> java.io.EOFException
> at
> org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:152)
> at
> org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:63)
> at
> org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:155)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:93)
> Caused by: java.util.concurrent.ExecutionException: java.io.EOFException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:136)
> ... 3 more
> Caused by: java.io.EOFException
> at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.doIndexing(SSTableWriter.java:457)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$RowIndexer.index(SSTableWriter.java:364)
> at
> org.apache.cassandra.io.sstable.SSTableWriter$Builder.build(SSTableWriter.java:315)
> at
> org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1099)
> at
> org.apache.cassandra.db.CompactionManager$9.call(CompactionManager.java:1090)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Looks to me like the receiving side fails first.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira