[
https://issues.apache.org/jira/browse/CASSANDRA-6364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828057#comment-13828057
]
J. Ryan Earl commented on CASSANDRA-6364:
-----------------------------------------
[~iamaleksey] So again, that's bad behavior, since the node will essentially
crater and be unable to handle read requests in a timely manner or at all, not
because the data isn't there to be read, but due to GC death as uncommitted
writes pile up on the heap and the JVM spends all of its time doing garbage
collection. Furthermore, it affects reads and writes not just to said node,
but on any connection that uses said node as its coordinator. At a minimum,
there should be different failure policies for commit and data volumes. The
scope or description of the ticket can be changed to that effect, maybe there
is a corner case where people only read from Cassandra such that "best_effort"
makes sense on the commit volume, but it's really hard to see any plausible
use-case where that would be desired.
Cassandra needs to be able to do "best_effort" on the data volumes, where it
makes no sense for a node to die when one of a JBOD of data disks fails, while
gracefully and immediately exiting on commit disk failures, which will
guarantee the node will become unresponsive in a short of amount of time under
write load.
> Cassandra should exit or otherwise handle when the commit volume dies
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-6364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6364
> Project: Cassandra
> Issue Type: Improvement
> Environment: JBOD, single dedicated commit disk
> Reporter: J. Ryan Earl
>
> We're doing fault testing on a pre-production Cassandra cluster. One of the
> tests was to simulation failure of the commit volume/disk, which in our case
> is on a dedicated disk. We expected failure of the commit volume to be
> handled somehow, but what we found was that no action was taken by Cassandra
> when the commit volume fail. We simulated this simply by pulling the
> physical disk that backed the commit volume, which resulted in filesystem I/O
> errors on the mount point.
> What then happened was that the Cassandra Heap filled up to the point that it
> was spending 90% of its time doing garbage collection. No errors were logged
> in regards to the failed commit volume. Gossip on other nodes in the cluster
> eventually flagged the node as down. Gossip on the local node showed itself
> as up, and all other nodes as down.
> The most serious problem was that connections to the coordinator on this node
> became very slow due to the on-going GC, as I assume uncommitted writes piled
> up on the JVM heap. What we believe should have happened is that Cassandra
> should have caught the I/O error and exited with a useful log message, or
> otherwise done some sort of useful cleanup. Otherwise the node goes into a
> sort of Zombie state, spending most of its time in GC, and thus slowing down
> any transactions that happen to use the coordinator on said node.
> A limit on in-memory, unflushed writes before refusing requests may also
> work. Point being, something should be done to handle the commit volume
> dying as doing nothing results in affecting the entire cluster. I should
> note, we are using: disk_failure_policy: best_effort
--
This message was sent by Atlassian JIRA
(v6.1#6144)