Oded Peer created CASSANDRA-6829:
------------------------------------

             Summary: nodes sporadically shutting down
                 Key: CASSANDRA-6829
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6829
             Project: Cassandra
          Issue Type: Bug
          Components: Core
         Environment: Windows Azure VMs.
The VMs OS is SUSE Enterprise. I striped 2 logical volumes  for each VM, one 
for data and one for commitlog, and formatted them as XFS.
Oracle Java 1.7_45
Datastax Enterprise 4.0 (Cassandra version 2.0.5.22)
            Reporter: Oded Peer


I deployed a Datastax 4.0 Cassandra cluster in Windows Azure and started load 
tests. After a while some of the nodes announce shutdown and stop responding to 
client requests.
The error preceding the shutdown is "FSWriteError in 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-581-Data.db"  
"Caused by: java.io.IOException: Input/output error".

The storage I'm using in my VMs is Azure Blob storage. The VMs OS is SUSE 
Enterprise. I striped 2 logical volumes  for each VM, one for data and one for 
commitlog, and formatted them as XFS.

I am using Oracle Java 1.7_45

Restarting the Cassandra process resolves the problem for a short while 
(minutes) afterwards the problem occurs again.

I noticed that it happens only in tmp files of a specific table. See the errors 
from 3 random nodes:

(1) ERROR [CompactionExecutor:48] 2014-03-09 11:38:45,188 CassandraDaemon.java 
(line 192) Exception in thread Thread[CompactionExecutor:48,1,main]
FSWriteError in 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-409-Data.db

(2) ERROR [CompactionExecutor:37] 2014-03-10 10:04:30,828 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:37,1,main]
FSWriteError in 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-946-Data.db

(3) ERROR [CompactionExecutor:48] 2014-03-10 10:23:39,248 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:48,1,main]
FSWriteError in 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-874-Data.db

The table is a wide-row table created as:
CREATE TABLE event_log (
  time_slice bigint,
  distribution_key int,
  event_id text,
  ... 300 columns ...
  PRIMARY KEY ((time_slice, distribution_key), event_id)
) compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

CREATE INDEX EVENT_LOG_2IX ON event_log (event_id);

'time_slice' represents a 5 minute time-period such as yyyyMMddHHmm where 'mm' 
is between 00 and 55 with increments of 5.

The Data files under the 'data' directory got to be very big in a very short 
time after the test started.
For example:
1.5G Mar 10 10:50 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-jb-968-Data.db
3.0G Mar 10 11:41 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-970-Data.db

Full stack trace:

ERROR [CompactionExecutor:37] 2014-03-10 10:04:30,828 CassandraDaemon.java 
(line 196) Exception in thread Thread[CompactionExecutor:37,1,main]
FSWriteError in 
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-946-Data.db
        at 
org.apache.cassandra.io.compress.CompressedSequentialWriter.close(CompressedSequentialWriter.java:270)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.close(SSTableWriter.java:356)
        at 
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:324)
        at 
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
        at 
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
        at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
        at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
        at 
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Input/output error
        at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
        at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
        at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
        at 
org.apache.cassandra.io.compress.CompressionMetadata$Writer.close(CompressionMetadata.java:366)
        at 
org.apache.cassandra.io.compress.CompressedSequentialWriter.close(CompressedSequentialWriter.java:266)
        ... 13 more







--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to