Hi - I am running a 29 node cluster spread over 4 DC's in EC2, using C* 3.11.1 on Ubuntu. Occasionally I have the need to restart nodes in the cluster, but every time I do, I see errors and application (nodejs) timeouts.
I restart a node like this: nodetool disablethrift && nodetool disablegossip && nodetool drain sudo service cassandra restart When I do that, I very often get timeouts and errors like this in my nodejs app: Error: Cannot achieve consistency level LOCAL_ONE My queries are all pretty much the same, things like: "select * from history where ts > {current_time}" The errors and timeouts seem to go away on their own after a while, but it is frustrating because I can't track down what I am doing wrong! I've tried waiting between steps of shutting down cassandra, and I've tried stopping, waiting, then starting the node. One thing I've noticed is that even after `nodetool drain`ing the node, there are open connections to other nodes in the cluster (ie looking at the output of netstat) until I stop cassandra. I don't see any errors or warnings in the logs. What can I do to prevent this? Is there something else I should be doing to gracefully restart the cluster? It could be something to do with the nodejs driver, but I can't find anything there to try. I appreciate any suggestions or advice. - Mike