Oded Peer created CASSANDRA-6829:
------------------------------------
Summary: nodes sporadically shutting down
Key: CASSANDRA-6829
URL: https://issues.apache.org/jira/browse/CASSANDRA-6829
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: Windows Azure VMs.
The VMs OS is SUSE Enterprise. I striped 2 logical volumes for each VM, one
for data and one for commitlog, and formatted them as XFS.
Oracle Java 1.7_45
Datastax Enterprise 4.0 (Cassandra version 2.0.5.22)
Reporter: Oded Peer
I deployed a Datastax 4.0 Cassandra cluster in Windows Azure and started load
tests. After a while some of the nodes announce shutdown and stop responding to
client requests.
The error preceding the shutdown is "FSWriteError in
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-581-Data.db"
"Caused by: java.io.IOException: Input/output error".
The storage I'm using in my VMs is Azure Blob storage. The VMs OS is SUSE
Enterprise. I striped 2 logical volumes for each VM, one for data and one for
commitlog, and formatted them as XFS.
I am using Oracle Java 1.7_45
Restarting the Cassandra process resolves the problem for a short while
(minutes) afterwards the problem occurs again.
I noticed that it happens only in tmp files of a specific table. See the errors
from 3 random nodes:
(1) ERROR [CompactionExecutor:48] 2014-03-09 11:38:45,188 CassandraDaemon.java
(line 192) Exception in thread Thread[CompactionExecutor:48,1,main]
FSWriteError in
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-409-Data.db
(2) ERROR [CompactionExecutor:37] 2014-03-10 10:04:30,828 CassandraDaemon.java
(line 196) Exception in thread Thread[CompactionExecutor:37,1,main]
FSWriteError in
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-946-Data.db
(3) ERROR [CompactionExecutor:48] 2014-03-10 10:23:39,248 CassandraDaemon.java
(line 196) Exception in thread Thread[CompactionExecutor:48,1,main]
FSWriteError in
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-874-Data.db
The table is a wide-row table created as:
CREATE TABLE event_log (
time_slice bigint,
distribution_key int,
event_id text,
... 300 columns ...
PRIMARY KEY ((time_slice, distribution_key), event_id)
) compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'LZ4Compressor'};
CREATE INDEX EVENT_LOG_2IX ON event_log (event_id);
'time_slice' represents a 5 minute time-period such as yyyyMMddHHmm where 'mm'
is between 00 and 55 with increments of 5.
The Data files under the 'data' directory got to be very big in a very short
time after the test started.
For example:
1.5G Mar 10 10:50
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-jb-968-Data.db
3.0G Mar 10 11:41
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-970-Data.db
Full stack trace:
ERROR [CompactionExecutor:37] 2014-03-10 10:04:30,828 CassandraDaemon.java
(line 196) Exception in thread Thread[CompactionExecutor:37,1,main]
FSWriteError in
/mnt/dsedata/lib/cassandra/poc/event_log/poc-event_log-tmp-jb-946-Data.db
at
org.apache.cassandra.io.compress.CompressedSequentialWriter.close(CompressedSequentialWriter.java:270)
at
org.apache.cassandra.io.sstable.SSTableWriter.close(SSTableWriter.java:356)
at
org.apache.cassandra.io.sstable.SSTableWriter.closeAndOpenReader(SSTableWriter.java:324)
at
org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:204)
at
org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:60)
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:197)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Input/output error
at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
at
org.apache.cassandra.io.compress.CompressionMetadata$Writer.close(CompressionMetadata.java:366)
at
org.apache.cassandra.io.compress.CompressedSequentialWriter.close(CompressedSequentialWriter.java:266)
... 13 more
--
This message was sent by Atlassian JIRA
(v6.2#6252)