David O'Dell created CASSANDRA-7567:
---------------------------------------
Summary: when the commit_log disk for a single node is overwhelmed
the entire cluster slows down
Key: CASSANDRA-7567
URL: https://issues.apache.org/jira/browse/CASSANDRA-7567
Project: Cassandra
Issue Type: Bug
Components: Core
Environment: debian 7.5, bare metal, 14 nodes, 64CPUs, 64GB RAM,
commit_log disk sata, data disk SSD, vnodes, leveled compaction strategy
Reporter: David O'Dell
Attachments: write_request_latency.png
We've run into a situation where a single node out of 14 is experiencing high
disk io. This can happen when a node is being decommissioned or after it joins
the ring and runs into the bug cassandra-6621.
When this occurs the write latency for the entire cluster spikes.
>From 0.3ms to 170ms.
To simulate this simply run dd on the commit_log disk (dd if=/dev/zero
of=/tmp/foo bs=1024) and you will see that instantly all nodes in the cluster
have slowed down.
BTW overwhelming the data disk does not have this same effect.
Also I've tried this where the overwhelmed node isn't being connected directly
from the client and it still has the same effect.
--
This message was sent by Atlassian JIRA
(v6.2#6252)