Here's a weird one... what's the best way to get a Cassandra node into a "half-crashed" state?
We have a 3-node cluster running 0.7.5. A few days ago this happened organically to node1 - the partition the commitlog was on was 100% full and there was a "No space left on device" error, and after a while, although the cluster and node1 was still up, to the other nodes it was down, and messages like: DEBUG 14:36:55,546 ... timed out started to show up in its debug logs. We have a tool to indicate to the load balancer that a Cassandra node is down, but it didn't detect it that time. Now I'm having trouble purposefully getting the node back to that state, so that I can try other monitoring methods. I've tried to fill up the commitlog partition with other files, and although I get the "No space left on device" error, the node still doesn't go down and show the other symptoms it showed before. Also, if anyone could recommend a good way for a node itself to detect that its in such a state I'd be interested in that too. Currently what we're doing is making a "describe_cluster_name()" thrift call, but that still worked when the node was "down". I'm thinking of something like reading/writing to a fixed value in a keyspace as a check... Unfortunately Java-based solutions are out of the question. Thanks, Suan