Hi -

I am running a 29 node cluster spread over 4 DC's in EC2, using C* 3.11.1
on Ubuntu. Occasionally I have the need to restart nodes in the cluster,
but every time I do, I see errors and application (nodejs) timeouts.

I restart a node like this:

nodetool disablethrift && nodetool disablegossip && nodetool drain
sudo service cassandra restart

When I do that, I very often get timeouts and errors like this in my nodejs
app:

Error: Cannot achieve consistency level LOCAL_ONE

My queries are all pretty much the same, things like: "select * from
history where ts > {current_time}"

The errors and timeouts seem to go away on their own after a while, but it
is frustrating because I can't track down what I am doing wrong!

I've tried waiting between steps of shutting down cassandra, and I've tried
stopping, waiting, then starting the node. One thing I've noticed is that
even after `nodetool drain`ing the node, there are open connections to
other nodes in the cluster (ie looking at the output of netstat) until I
stop cassandra. I don't see any errors or warnings in the logs.

What can I do to prevent this? Is there something else I should be doing to
gracefully restart the cluster? It could be something to do with the nodejs
driver, but I can't find anything there to try.

I appreciate any suggestions or advice.

- Mike

Reply via email to