RE: node restart causes application latency

2018-02-15 Thread Jonathan Baynes
? From: Mike Torra [mailto:mto...@salesforce.com] Sent: 13 February 2018 15:10 To: user@cassandra.apache.org Subject: Re: node restart causes application latency Then could it be that calling `nodetool drain` after calling `nodetool disablegossip` is what causes the problem? On Mon, Feb 12, 2018

Re: node restart causes application latency

2018-02-13 Thread Mike Torra
Then could it be that calling `nodetool drain` after calling `nodetool disablegossip` is what causes the problem? On Mon, Feb 12, 2018 at 6:12 PM, kurt greaves wrote: > > ​Actually, it's not really clear to me why disablebinary and thrift are > necessary prior to drain,

Re: node restart causes application latency

2018-02-12 Thread kurt greaves
​Actually, it's not really clear to me why disablebinary and thrift are necessary prior to drain, because they happen in the same order during drain anyway. It also really doesn't make sense that disabling gossip after drain would make a difference here, because it should be already stopped. This

Re: node restart causes application latency

2018-02-12 Thread kurt greaves
Drain will take care of stopping gossip, and does a few tasks before stopping gossip (stops batchlog, hints, auth, cache saver and a few other things). I'm not sure why this causes a side effect when you restart the node, but there should be no need to issue a disablegossip anyway, just leave that

Re: node restart causes application latency

2018-02-12 Thread Mike Torra
Interestingly, it seems that changing the order of steps I take during the node restart resolves the problem. Instead of: `nodetool disablebinary && nodetool disablethrift && *nodetool disablegossip* && nodetool drain && sudo service cassandra restart`, if I do: `nodetool disablebinary &&

Re: node restart causes application latency

2018-02-12 Thread Mike Torra
Any other ideas? If I simply stop the node, there is no latency problem, but once I start the node the problem appears. This happens consistently for all nodes in the cluster On Wed, Feb 7, 2018 at 11:36 AM, Mike Torra wrote: > No, I am not > > On Wed, Feb 7, 2018 at

Re: node restart causes application latency

2018-02-07 Thread Mike Torra
No, I am not On Wed, Feb 7, 2018 at 11:35 AM, Jeff Jirsa wrote: > Are you using internode ssl? > > > -- > Jeff Jirsa > > > On Feb 7, 2018, at 8:24 AM, Mike Torra wrote: > > Thanks for the feedback guys. That example data model was indeed > abbreviated -

Re: node restart causes application latency

2018-02-07 Thread Jeff Jirsa
Are you using internode ssl? -- Jeff Jirsa > On Feb 7, 2018, at 8:24 AM, Mike Torra wrote: > > Thanks for the feedback guys. That example data model was indeed abbreviated > - the real queries have the partition key in them. I am using RF 3 on the > keyspace, so I

Re: node restart causes application latency

2018-02-07 Thread Mike Torra
Thanks for the feedback guys. That example data model was indeed abbreviated - the real queries have the partition key in them. I am using RF 3 on the keyspace, so I don't think a node being down would mean the key I'm looking for would be unavailable. The load balancing policy of the driver seems

Re: node restart causes application latency

2018-02-06 Thread Jeff Jirsa
Unless you abbreviated, your data model is questionable (SELECT without any equality in the WHERE clause on the partition key will always cause a range scan, which is super inefficient). Since you're doing LOCAL_ONE and a range scan, timeouts sorta make sense - the owner of at least one range

Re: node restart causes application latency

2018-02-06 Thread Michael Shuler
On 02/06/2018 12:58 PM, Mike Torra wrote: > > I restart a node like this: > > nodetool disablethrift && nodetool disablegossip && nodetool drain > sudo service cassandra restart Just a guess here - are you really only using thrift? (ie. `nodetool disablebinary`) > When I do that, I very often

node restart causes application latency

2018-02-06 Thread Mike Torra
Hi - I am running a 29 node cluster spread over 4 DC's in EC2, using C* 3.11.1 on Ubuntu. Occasionally I have the need to restart nodes in the cluster, but every time I do, I see errors and application (nodejs) timeouts. I restart a node like this: nodetool disablethrift && nodetool