RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Rahul Singh
It may be that the wife partition is bombarded more than other partitions. 
What’s your RF on that keyspace? If if it’s greater than 1 I’d expect other 
nodes to get the same type of load.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On Mar 27, 2018, 5:56 AM -0700, Kenneth Brotman <kenbrot...@yahoo.com.invalid>, 
wrote:
> First, anything Jeff Jirsa says is likely very accurate, like it being a 
> really good idea to also get off the version you’re on and onto a version 
> that fixes some of the known problems of the version you’re one.
>
> Replacing a running node:
> https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html
>
> Kenneth Brotman
>
>
> From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
> Sent: Tuesday, March 27, 2018 5:44 AM
> To: user@cassandra.apache.org
> Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster
>
> Thanks,Kenneth,this is production database,and it is one of three seed 
> nodes,do you have doc for replacing a seed node?
>
>
>
> 发自我的小米手机
> 在 Kenneth Brotman <kenbrot...@yahoo.com.INVALID>,2018年3月27日 下午7:45写道:
> David,
>
> Can you replace the misbehaving node to see if that resolves the problem?
>
> Kenneth Brotman
>
> From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com]
> Sent: Tuesday, March 27, 2018 3:27 AM
> To: Jeff Jirsa
> Cc: user@cassandra.apache.org
> Subject: 答复: 答复: A node down every day in a 6 nodes cluster
>
> Thanks Jeff,
>    So your suggestion is to first resolve the data model issue which 
> cause wide partition,right?
>
> Best Regards,
>
> 倪项菲/ David Ni
> 中移德电网络科技有限公司
> Virtue Intelligent Network Ltd, co.
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>
> 发件人: Jeff Jirsa <jji...@gmail.com>
> 发送时间: 2018年3月27日 11:50
> 收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
> 抄送: user@cassandra.apache.org
> 主题: Re: 答复: A node down every day in a 6 nodes cluster
>
> Only one node having the problem is suspicious. May be that your application 
> is improperly pooling connections, or you have a hardware problem.
>
> I dont see anything in nodetool that explains it, though you certainly have a 
> data model likely to cause problems over time (the cardinality of
> rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
> that you have very wide partitions and it'll be difficult to read).
>
> On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:
> > quote_type
> > Hi Jeff,
> > I need to restart the node manually every time,only one node has this 
> > problem.
> > I have attached the nodetool output,thanks.
> >
> > Best Regards,
> >
> > 倪项菲/ David Ni
> > 中移德电网络科技有限公司
> > Virtue Intelligent Network Ltd, co.
> > Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
> > Mob: +86 13797007811|Tel: + 86 27 5024 2516
> >
> > 发件人: Jeff Jirsa <jji...@gmail.com>
> > 发送时间: 2018年3月27日 11:03
> > 收件人: user@cassandra.apache.org
> > 主题: Re: A node down every day in a 6 nodes cluster
> >
> > That warning isn’t sufficient to understand why the node is going down
> >
> >
> > Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
> > likely a good idea
> >
> > Are the nodes coming up on their own? Or are you restarting them?
> >
> > Paste the output of nodetool tpstats and nodetool cfstats
> >
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:
> > > Hi Cassandra experts,
> > >   I am facing an issue,a node downs every day in a 6 nodes cluster,the 
> > > cluster is just in one DC,
> > >   Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
> > > HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business 
> > > CF is 3,a node downs one time every day,the system.log shows below info:
> > > WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
> > > CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize 
> > > # for 
> > > ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
> > > QueryMessage.java:128 - Unexpected error during query
> > > com.google.common.util.concurrent.UncheckedExecutionException: 
> > > java.lang.RuntimeException: 
> > > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out 
> > > - received only 0 responses.
> > > 

RE: RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman
First, anything Jeff Jirsa says is likely very accurate, like it being a really 
good idea to also get off the version you’re on and onto a version that fixes 
some of the known problems of the version you’re one.

 

Replacing a running node:

https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsReplaceLiveNode.html

 

Kenneth Brotman

 

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Tuesday, March 27, 2018 5:44 AM
To: user@cassandra.apache.org
Subject: Re:RE: 答复: A node down every day in a 6 nodes cluster

 

Thanks,Kenneth,this is production database,and it is one of three seed nodes,do 
you have doc for replacing a seed node?

 

 

 

发自我的小米手机

在 Kenneth Brotman <kenbrot...@yahoo.com.INVALID>,2018年3月27日 下午7:45写道:

David,

 

Can you replace the misbehaving node to see if that resolves the problem?

 

Kenneth Brotman

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Tuesday, March 27, 2018 3:27 AM
To: Jeff Jirsa
Cc: user@cassandra.apache.org
Subject: 答复: 答复: A node down every day in a 6 nodes cluster

 

Thanks Jeff,

   So your suggestion is to first resolve the data model issue which 
cause wide partition,right?

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]

at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]

at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.

RE: 答复: A node down every day in a 6 nodes cluster

2018-03-27 Thread Kenneth Brotman
David,

 

Can you replace the misbehaving node to see if that resolves the problem?

 

Kenneth Brotman

 

From: Xiangfei Ni [mailto:xiangfei...@cm-dt.com] 
Sent: Tuesday, March 27, 2018 3:27 AM
To: Jeff Jirsa
Cc: user@cassandra.apache.org
Subject: 答复: 答复: A node down every day in a 6 nodes cluster

 

Thanks Jeff,

   So your suggestion is to first resolve the data model issue which 
cause wide partition,right?

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811|Tel: + 86 27 5024 2516

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:50
收件人: Xiangfei Ni <xiangfei...@cm-dt.com>
抄送: user@cassandra.apache.org
主题: Re: 答复: A node down every day in a 6 nodes cluster

 

Only one node having the problem is suspicious. May be that your application is 
improperly pooling connections, or you have a hardware problem.

 

I dont see anything in nodetool that explains it, though you certainly have a 
data model likely to cause problems over time (the cardinality of 

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver is such 
that you have very wide partitions and it'll be difficult to read).
 
 

 

On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Jeff,

I need to restart the node manually every time,only one node has this 
problem.

I have attached the nodetool output,thanks.

 

Best Regards, 

 

倪项菲/ David Ni

中移德电网络科技有限公司

Virtue Intelligent Network Ltd, co.

Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei

Mob: +86 13797007811 <tel:+86%20137%209700%207811> |Tel: + 86 27 5024 2516 
<tel:+86%2027%205024%202516> 

 

发件人: Jeff Jirsa <jji...@gmail.com> 
发送时间: 2018年3月27日 11:03
收件人: user@cassandra.apache.org
主题: Re: A node down every day in a 6 nodes cluster

 

That warning isn’t sufficient to understand why the node is going down

 

 

Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3 is 
likely a good idea

 

Are the nodes coming up on their own? Or are you restarting them?

 

Paste the output of nodetool tpstats and nodetool cfstats

 

 

 

-- 

Jeff Jirsa

 


On Mar 26, 2018, at 7:56 PM, Xiangfei Ni <xiangfei...@cm-dt.com> wrote:

Hi Cassandra experts,

  I am facing an issue,a node downs every day in a 6 nodes cluster,the cluster 
is just in one DC,

  Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m 
HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business CF is 
3,a node downs one time every day,the system.log shows below info:

WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128 
CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize # for 

ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129 
QueryMessage.java:128 - Unexpected error during query

com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: 
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - 
received only 0 responses.

at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.get(LocalCache.java:3937) 
~[guava-18.0.jar:na]

at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941) 
~[guava-18.0.jar:na]

at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824) 
~[guava-18.0.jar:na]

at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.authorize(ClientState.java:419) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.checkPermissionOnResourceChain(ClientState.java:352)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.statements.ModificationStatement.checkAccess(ModificationStatement.java:211)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185)
 ~[apache-cassandra-3.9.jar:3.9]

at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219) 
~[apache-cassandra-3.9.jar:3.9]

at 
org.ap

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread Jeff Jirsa
Only one node having the problem is suspicious. May be that your
application is improperly pooling connections, or you have a hardware
problem.

I dont see anything in nodetool that explains it, though you certainly have
a data model likely to cause problems over time (the cardinality of

rt_ac_stat.idx_rt_ac_stat_prot_verrt_ac_stat.idx_rt_ac_stat_prot_ver
is such that you have very wide partitions and it'll be difficult to
read).




On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni  wrote:

> Hi Jeff,
>
> I need to restart the node manually every time,only one node has this
> problem.
>
> I have attached the nodetool output,thanks.
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516
> <+86%2027%205024%202516>
>
>
>
> *发件人:* Jeff Jirsa 
> *发送时间:* 2018年3月27日 11:03
> *收件人:* user@cassandra.apache.org
> *主题:* Re: A node down every day in a 6 nodes cluster
>
>
>
> That warning isn’t sufficient to understand why the node is going down
>
>
>
>
>
> Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3
> is likely a good idea
>
>
>
> Are the nodes coming up on their own? Or are you restarting them?
>
>
>
> Paste the output of nodetool tpstats and nodetool cfstats
>
>
>
>
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 26, 2018, at 7:56 PM, Xiangfei Ni  wrote:
>
> Hi Cassandra experts,
>
>   I am facing an issue,a node downs every day in a 6 nodes cluster,the
> cluster is just in one DC,
>
>   Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m
> HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business
> CF is 3,a node downs one time every day,the system.log shows below info:
>
> WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128
> CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize
> # for 
>
> ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129
> QueryMessage.java:128 - Unexpected error during query
>
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException:
> Operation timed out - received only 0 responses.
>
> at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
> ~[guava-18.0.jar:na]
>
> at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
> ~[guava-18.0.jar:na]
>
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
> ~[guava-18.0.jar:na]
>
> at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
> ~[guava-18.0.jar:na]
>
> at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.authorize(ClientState.java:419)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at org.apache.cassandra.service.ClientState.
> checkPermissionOnResourceChain(ClientState.java:352)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at org.apache.cassandra.cql3.statements.ModificationStatement.
> checkAccess(ModificationStatement.java:211) ~[apache-cassandra-3.9.jar:3.
> 9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
> [apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
> [apache-cassandra-3.9.jar:3.9]
>
> at io.netty.channel.SimpleChannelInboundHandler.channelRead(
> SimpleChannelInboundHandler.java:105) [netty-all-4.0.39.Final.jar:4.
> 0.39.Final]
>
> at 

Re: 答复: A node down every day in a 6 nodes cluster

2018-03-26 Thread daemeon reiydelle
Look for errors on your network interface. I think you have periodic errors
in your network connectivity


<==>
"Who do you think made the first stone spear? The Asperger guy.
If you get rid of the autism genetics, there would be no Silicon Valley"
Temple Grandin


*Daemeon C.M. ReiydelleSan Francisco 1.415.501.0198London 44 020 8144 9872*


On Mon, Mar 26, 2018 at 8:26 PM, Xiangfei Ni  wrote:

> Hi Jeff,
>
> I need to restart the node manually every time,only one node has this
> problem.
>
> I have attached the nodetool output,thanks.
>
>
>
> Best Regards,
>
>
>
> 倪项菲*/ **David Ni*
>
> 中移德电网络科技有限公司
>
> Virtue Intelligent Network Ltd, co.
>
> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>
> Mob: +86 13797007811 <+86%20137%209700%207811>|Tel: + 86 27 5024 2516
> <+86%2027%205024%202516>
>
>
>
> *发件人:* Jeff Jirsa 
> *发送时间:* 2018年3月27日 11:03
> *收件人:* user@cassandra.apache.org
> *主题:* Re: A node down every day in a 6 nodes cluster
>
>
>
> That warning isn’t sufficient to understand why the node is going down
>
>
>
>
>
> Cassandra 3.9 has some pretty serious known issues - upgrading to 3.11.3
> is likely a good idea
>
>
>
> Are the nodes coming up on their own? Or are you restarting them?
>
>
>
> Paste the output of nodetool tpstats and nodetool cfstats
>
>
>
>
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 26, 2018, at 7:56 PM, Xiangfei Ni  wrote:
>
> Hi Cassandra experts,
>
>   I am facing an issue,a node downs every day in a 6 nodes cluster,the
> cluster is just in one DC,
>
>   Every node has 4C 16G,and the heap configuration is MAX_HEAP_SIZE=8192m
> HEAP_NEWSIZE=512m,every node load about 200G data,the RF for the business
> CF is 3,a node downs one time every day,the system.log shows below info:
>
> WARN  [Native-Transport-Requests-19] 2018-03-26 18:53:17,128
> CassandraAuthorizer.java:101 - CassandraAuthorizer failed to authorize
> # for 
>
> ERROR [Native-Transport-Requests-19] 2018-03-26 18:53:17,129
> QueryMessage.java:128 - Unexpected error during query
>
> com.google.common.util.concurrent.UncheckedExecutionException:
> java.lang.RuntimeException: 
> org.apache.cassandra.exceptions.ReadTimeoutException:
> Operation timed out - received only 0 responses.
>
> at 
> com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2203)
> ~[guava-18.0.jar:na]
>
> at com.google.common.cache.LocalCache.get(LocalCache.java:3937)
> ~[guava-18.0.jar:na]
>
> at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3941)
> ~[guava-18.0.jar:na]
>
> at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4824)
> ~[guava-18.0.jar:na]
>
> at org.apache.cassandra.auth.AuthCache.get(AuthCache.java:108)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.auth.PermissionsCache.getPermissions(PermissionsCache.java:45)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.auth.AuthenticatedUser.getPermissions(AuthenticatedUser.java:104)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.authorize(ClientState.java:419)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at org.apache.cassandra.service.ClientState.
> checkPermissionOnResourceChain(ClientState.java:352)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.ensureHasPermission(ClientState.java:329)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.hasAccess(ClientState.java:316)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.service.ClientState.hasColumnFamilyAccess(ClientState.java:300)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at org.apache.cassandra.cql3.statements.ModificationStatement.
> checkAccess(ModificationStatement.java:211) ~[apache-cassandra-3.9.jar:3.
> 9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:185)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:219)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:204)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:115)
> ~[apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
> [apache-cassandra-3.9.jar:3.9]
>
> at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
> [apache-cassandra-3.9.jar:3.9]
>
> at io.netty.channel.SimpleChannelInboundHandler.channelRead(
> SimpleChannelInboundHandler.java:105) [netty-all-4.0.39.Final.jar:4.
> 0.39.Final]
>
> at io.netty.channel.AbstractChannelHandlerContext.
> invokeChannelRead(AbstractChannelHandlerContext.java:366)