[ 
https://issues.apache.org/jira/browse/CASSANDRA-9295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527812#comment-14527812
 ] 

Yuki Morishita commented on CASSANDRA-9295:
-------------------------------------------

I think there are two problems here: File corruption and SSTable ref release 
process in error handling.
I'm not sure about the cause of the former. Does it just happen on uncompressed 
SSTables?

For the latter, when some error occurred, streaming session is still holding 
refs to SSTable and trying to use them even after release.
StreamSession releases SSTable refs, but in outgoing message thread, there are 
already messages queued to be flushed out that holds the same refs.
I will post the patch for it.

> Streaming not holding on to ref's long enough.
> ----------------------------------------------
>
>                 Key: CASSANDRA-9295
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9295
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremiah Jordan
>             Fix For: 2.1.x
>
>         Attachments: 9295.debug.txt
>
>
> While doing some testing around adding/removing nodes under load with 
> cassandra-2.1 head as of a few days ago (after was 2.1.5 tagged) I am seeing 
> stream out errors with file not found exceptions.  The file in question just 
> finished being compacted into a new file a few lines earlier in the log.  
> Seems that streaming isn't holding onto Ref's correctly for the stuff in the 
> stream plans.
> I also see a "corrupt sstable" exception for the file the "missing" file was 
> compacted to. Trimmed logs with just the compaction/streaming related stuff:
> You can see the stream plan is initiated in between the compaction starting, 
> and the compaction finishing.
> {noformat}
> INFO  [MemtableFlushWriter:3] 2015-05-04 16:08:21,239  Memtable.java:380 - 
> Completed flushing 
> /mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-4-Data.db
>  (60666088 bytes) for commitlog position 
> ReplayPosition(segmentId=1430755416941, position=32294797)
> INFO  [CompactionExecutor:4] 2015-05-04 16:08:40,856  CompactionTask.java:140 
> - Compacting 
> [SSTableReader(path='/mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-4-Data.db'),
>  
> SSTableReader(path='/mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-3-Data.db')]
> INFO  [STREAM-INIT-/10.240.213.56:53190] 2015-05-04 16:09:31,047  
> StreamResultFuture.java:109 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9 
> ID#0] Creating new streaming plan for Rebuild
> INFO  [STREAM-INIT-/10.240.213.56:53190] 2015-05-04 16:09:31,238  
> StreamResultFuture.java:116 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9, 
> ID#0] Received streaming plan for Rebuild
> INFO  [STREAM-INIT-/10.240.213.56:53192] 2015-05-04 16:09:31,249  
> StreamResultFuture.java:116 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9, 
> ID#0] Received streaming plan for Rebuild
> INFO  [STREAM-IN-/10.240.213.56] 2015-05-04 16:09:31,353  
> ColumnFamilyStore.java:882 - Enqueuing flush of standard1: 91768068 (19%) 
> on-heap, 0 (0%) off-heap
> INFO  [STREAM-IN-/10.240.213.56] 2015-05-04 16:09:37,425  
> ColumnFamilyStore.java:882 - Enqueuing flush of solr: 10012689 (2%) on-heap, 
> 0 (0%) off-heap
> INFO  [STREAM-IN-/10.240.213.56] 2015-05-04 16:09:38,073  
> StreamResultFuture.java:166 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9 
> ID#0] Prepare completed. Receiving 0 files(0 bytes), sending 6 
> files(284288285 bytes)
> INFO  [CompactionExecutor:4] 2015-05-04 16:10:11,047  CompactionTask.java:270 
> - Compacted 2 sstables to 
> [/mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-5,/mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-8,].
>   182,162,816 bytes to 182,162,816 (~100% of original) in 90,188ms = 
> 1.926243MB/s.  339,856 total partitions merged to 339,856.  Partition merge 
> counts were {1:339856, }
> ERROR [STREAM-OUT-/10.240.213.56] 2015-05-04 16:10:25,169  
> StreamSession.java:477 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9] 
> Streaming error occurred
> java.io.IOException: Corrupted SSTable : 
> /mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-5-Data.db
>       at 
> org.apache.cassandra.io.util.DataIntegrityMetadata$ChecksumValidator.validate(DataIntegrityMetadata.java:79)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:149) 
> ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:102) 
> ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:58)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:346)
>  [cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:318)
>  [cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> INFO  [STREAM-OUT-/10.240.213.56] 2015-05-04 16:10:25,232  
> StreamResultFuture.java:180 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9] 
> Session with /10.240.213.56 is complete
> WARN  [STREAM-OUT-/10.240.213.56] 2015-05-04 16:10:25,269  
> StreamResultFuture.java:207 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9] 
> Stream failed
> ERROR [STREAM-OUT-/10.240.213.56] 2015-05-04 16:10:25,307  
> StreamSession.java:477 - [Stream #f261c040-f277-11e4-a070-d126f0416bc9] 
> Streaming error occurred
> java.lang.RuntimeException: java.io.FileNotFoundException: 
> /mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-4-Data.db
>  (No such file or directory)
>       at 
> org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:124)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:107)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.io.util.SegmentedFile.createReader(SegmentedFile.java:99)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.io.sstable.SSTableReader.openDataReader(SSTableReader.java:1955)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.StreamWriter.write(StreamWriter.java:74) 
> ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:58)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:346)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:326)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> Caused by: java.io.FileNotFoundException: 
> /mnt/cass_data_disks/data1/keyspace1/standard1-49f17b30f27711e4a438775021e2cd7f/keyspace1-standard1-ka-4-Data.db
>  (No such file or directory)
>       at java.io.RandomAccessFile.open0(Native Method) ~[na:1.8.0_40]
>       at java.io.RandomAccessFile.open(RandomAccessFile.java:316) 
> ~[na:1.8.0_40]
>       at java.io.RandomAccessFile.<init>(RandomAccessFile.java:243) 
> ~[na:1.8.0_40]
>       at 
> org.apache.cassandra.io.util.RandomAccessReader.<init>(RandomAccessReader.java:64)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       at 
> org.apache.cassandra.io.util.RandomAccessReader.open(RandomAccessReader.java:120)
>  ~[cassandra-all-2.1.5.426.jar:2.1.5.426]
>       ... 10 common frames omitted
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to