Re: Cassandra takes more time to join the cluster
Hi, This issue is related to https://issues.apache.org/jira/browse/CASSANDRA-9630 Thanks and Regards A.SathishKumar On Sun, Dec 24, 2017 at 2:59 AM, sat <sathish.al...@gmail.com> wrote: > Hi, > > We have 3 node cluster, before we reboot one of the VMWare VM/node (RHEL > 7.2) all 3 nodes formed the cluster without any issues. However after > reboot, we noticed rebooted node (7 out of 10 times) takes more time to > join approximately 10 - 15min. > > Cassandra - 3.9 version > > While investigating the issue further we noticed > >- Node 1 (rebooted node) able to send "SYN", "ACK2" messages for both >the nodes (Node 2, Node 3) even though nodetool status displays "Node 2 and >3 as "DN"" only in "Node 1" >- After 10 - 15min we noticed *"Connection Timeout"* exception in >Node 2 and 3. being thrown from OutboundTcpConnection.java (line # 311) >which triggers a state change event to "Node 1" and changes the state as >"UN". > > > >> *if (logger.isTraceEnabled())**logger.trace("error >> writing to {}", poolReference.endPoint(), e);* > > > Please let us know what triggers "Connection TimeOut" exception in "Node 2 > and 3" and ways to resolve this. > > > Thanks and Regards > A.SathishKumar > > On Fri, Dec 22, 2017 at 12:06 PM, sat <sathish.al...@gmail.com> wrote: > >> Hi Jeff, >> >> Thanks for your prompt response. Please find logs. Does Gossip has >> dependency with NTP, because we find the node we reboot takes time to sync >> with NTP. >> >> When will FailureDetector message kick in, we have this issue only when >> we see Gossiper and Failure Detector log message. >> >> TRACE [GossipStage:1] 2017-12-22 12:46:26,759 FailureDetector.java:272 - >> Average for /10.63.114.158 is 9.977908204545455E8 >> TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:1140 - / >> 10.63.114.158 local generation 1513967155, remote generation 1513967155 >> TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:1193 - >> Updating heartbeat state version to 14666 from 14663 for /10.63.114.158 >> ... >> TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:986 - Sending >> a EchoMessage to /10.63.114.158 >> TRACE [GossipStage:1] 2017-12-22 12:46:26,759 MessagingService.java:760 - >> /10.63.114.150 sending ECHO to 6794@/10.63.114.158 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,856 Gossiper.java:144 - My >> heartbeat is now 2258 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,856 Gossiper.java:500 - Gossip >> Digests are : /10.63.114.150:1513971247:2258 /10.63.114.154:1513967154 >> :14664 /10.63.114.158:1513967155:14666 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:646 - Sending >> a GossipDigestSyn to /10.63.114.158 ... >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 MessagingService.java:760 - >> /10.63.114.150 sending GOSSIP_DIGEST_SYN to 6795@/10.63.114.158 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:646 - Sending >> a GossipDigestSyn to /10.63.114.154 ... >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 MessagingService.java:760 - >> /10.63.114.150 sending GOSSIP_DIGEST_SYN to 6796@/10.63.114.154 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:757 - >> Performing status check ... >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:298 - >> PHI for /10.63.114.158 : 0.09877714244264273 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:315 - >> PHI for /10.63.114.158 : 0.09877714244264273 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:316 - >> mean for /10.63.114.158 : 9.977908204545455E8 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:298 - >> PHI for /10.63.114.154 : 0.27857029580273446 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,858 FailureDetector.java:315 - >> PHI for /10.63.114.154 : 0.27857029580273446 >> TRACE [GossipTasks:1] 2017-12-22 12:46:26,858 FailureDetector.java:316 - >> mean for /10.63.114.154 : 7.65247444583815E8 >> TRACE [GossipStage:1] 2017-12-22 12:46:26,858 >> GossipDigestAckVerbHandler.java:41 - Received a GossipDigestAckMessage >> from /10.63.114.158 >> TRACE [GossipStage:1] 2017-12-22 12:46:26,858 >> GossipDigestAckVerbHandler.java:52 - Received ack with 1 digests and 0 >> states >> TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:889 - local >> heartbeat version 2258 greater than 2257 for /10.63.114.150 >> TRACE [GossipStage:1] 2017-12-22 12:46:2
Re: Discrepancy in nodetool status
Hi, We guess the issue we are facing is related to https://issues.apache.org/jira/browse/CASSANDRA-9630. Will it be fixed in 3.11 release. Thanks and Regards A.SathishKumar On Fri, Dec 22, 2017 at 6:15 PM, sat <sathish.al...@gmail.com> wrote: > Hi, > > We tried rebooting again Node 1 and this time we observed nodetool status > displaying "UN" for all 3 nodes in node1. > > Executing nodetool status on "Node 3" displays "UN" for all the nodes. > > Executing nodetool status on "Node 2" displays "DN" for node 1 (rebooted > node) and "UN" for other 2 nodes. We also observed "Node 2" sending "Syn" > message to "Node 1", but no "Ack" received from "Node 1" initially for > 15min and then it is started receiving it. > > Please let us know the reason why "Node 2" is not receiving any "Ack" > message from "Node 1" for 10 -15 minutes and how it suddenly able to > receive "Ac" messages. > > Inter node communication port is 7000. > > 3 node cluster, and we have all 3 nodes as seed ips. > > Thanks and Regards > A.SathishKumar > > > > On Fri, Dec 22, 2017 at 5:15 PM, sat <sathish.al...@gmail.com> wrote: > >> Hi, >> >> We checked and we were able to telnet to port 7000. >> >> Thanks and Regards >> A.SathishKumar >> >> On Fri, Dec 22, 2017 at 3:43 PM, Nitan Kainth <nitankai...@gmail.com> >> wrote: >> >>> Try telnet on your listen port. It must be network issue due to port or >>> firewall issue. >>> >>> Sent from my iPhone >>> >>> On Dec 22, 2017, at 5:28 PM, sat <sathish.al...@gmail.com> wrote: >>> >>> >>> >>> Hi, >>> >>> We have 3 nodes in cluster, we rebooted one of the cassandra VM, we >>> noticed nodetool status returning "UN" for itself and "DN" for other node, >>> although we observe gossip sync and ack messages being shared between these >>> nodes. >>> >>> *Issue in Detail* >>> >>> *Nodes in cluster* >>> Node1 >>> Node 2 >>> Node 3 >>> >>> All above nodes formed cluster and nodetool status in all 3 machines >>> were "UN" >>> >>> We rebooted Node 1 and restarted cassandra on node1, then we ran >>> nodetool status and observed >>> >>> Node 1 - UN >>> Node 2 - DN >>> Node 3 - DN >>> >>> However when we run nodetool status on other 2 nodes (Node2, and Node 3) >>> they claim all 3 nodes are "UN" >>> >>> We enabled "Trace" level and checked Gossip messages and noticed "SYN", >>> "ACK" and "ACK2" initiated and received messages in Node 1 for other 2 >>> nodes, but still nodetool status marks other 2 nodes as down. >>> >>> Please let us know how nodetool detects other nodes as "DOWN". Any help >>> is highly appreciated. >>> >>> Thanks >>> A.SathishKumar >>> >>> >> >> >> -- >> A.SathishKumar >> 044-24735023 >> > > > > -- > A.SathishKumar > 044-24735023 > -- A.SathishKumar 044-24735023
Re: Cassandra takes more time to join the cluster
Hi, We have 3 node cluster, before we reboot one of the VMWare VM/node (RHEL 7.2) all 3 nodes formed the cluster without any issues. However after reboot, we noticed rebooted node (7 out of 10 times) takes more time to join approximately 10 - 15min. Cassandra - 3.9 version While investigating the issue further we noticed - Node 1 (rebooted node) able to send "SYN", "ACK2" messages for both the nodes (Node 2, Node 3) even though nodetool status displays "Node 2 and 3 as "DN"" only in "Node 1" - After 10 - 15min we noticed *"Connection Timeout"* exception in Node 2 and 3. being thrown from OutboundTcpConnection.java (line # 311) which triggers a state change event to "Node 1" and changes the state as "UN". > *if (logger.isTraceEnabled())**logger.trace("error > writing to {}", poolReference.endPoint(), e);* Please let us know what triggers "Connection TimeOut" exception in "Node 2 and 3" and ways to resolve this. Thanks and Regards A.SathishKumar On Fri, Dec 22, 2017 at 12:06 PM, sat <sathish.al...@gmail.com> wrote: > Hi Jeff, > > Thanks for your prompt response. Please find logs. Does Gossip has > dependency with NTP, because we find the node we reboot takes time to sync > with NTP. > > When will FailureDetector message kick in, we have this issue only when we > see Gossiper and Failure Detector log message. > > TRACE [GossipStage:1] 2017-12-22 12:46:26,759 FailureDetector.java:272 - > Average for /10.63.114.158 is 9.977908204545455E8 > TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:1140 - / > 10.63.114.158 local generation 1513967155, remote generation 1513967155 > TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:1193 - > Updating heartbeat state version to 14666 from 14663 for /10.63.114.158 > ... > TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:986 - Sending > a EchoMessage to /10.63.114.158 > TRACE [GossipStage:1] 2017-12-22 12:46:26,759 MessagingService.java:760 - / > 10.63.114.150 sending ECHO to 6794@/10.63.114.158 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,856 Gossiper.java:144 - My > heartbeat is now 2258 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,856 Gossiper.java:500 - Gossip > Digests are : /10.63.114.150:1513971247:2258 /10.63.114.154:1513967154:14664 > /10.63.114.158:1513967155:14666 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:646 - Sending > a GossipDigestSyn to /10.63.114.158 ... > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 MessagingService.java:760 - / > 10.63.114.150 sending GOSSIP_DIGEST_SYN to 6795@/10.63.114.158 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:646 - Sending > a GossipDigestSyn to /10.63.114.154 ... > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 MessagingService.java:760 - / > 10.63.114.150 sending GOSSIP_DIGEST_SYN to 6796@/10.63.114.154 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:757 - > Performing status check ... > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:298 - > PHI for /10.63.114.158 : 0.09877714244264273 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:315 - > PHI for /10.63.114.158 : 0.09877714244264273 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:316 - > mean for /10.63.114.158 : 9.977908204545455E8 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:298 - > PHI for /10.63.114.154 : 0.27857029580273446 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,858 FailureDetector.java:315 - > PHI for /10.63.114.154 : 0.27857029580273446 > TRACE [GossipTasks:1] 2017-12-22 12:46:26,858 FailureDetector.java:316 - > mean for /10.63.114.154 : 7.65247444583815E8 > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 > GossipDigestAckVerbHandler.java:41 > - Received a GossipDigestAckMessage from /10.63.114.158 > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 > GossipDigestAckVerbHandler.java:52 > - Received ack with 1 digests and 0 states > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:889 - local > heartbeat version 2258 greater than 2257 for /10.63.114.150 > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 > GossipDigestAckVerbHandler.java:84 > - Sending a GossipDigestAck2Message to /10.63.114.158 > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 MessagingService.java:760 - / > 10.63.114.150 sending GOSSIP_DIGEST_ACK2 to 6797@/10.63.114.158 > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 > GossipDigestAckVerbHandler.java:41 > - Received a GossipDigestAckMessage from /10.63.114.154 > TRACE [GossipStage:1] 2017-12-22 12:46:26,858 > GossipDigestAckVerbHandler.java:52 > - Received ack with 2 digests and 1 states > TRACE [GossipSta
Re: Discrepancy in nodetool status
Hi, We tried rebooting again Node 1 and this time we observed nodetool status displaying "UN" for all 3 nodes in node1. Executing nodetool status on "Node 3" displays "UN" for all the nodes. Executing nodetool status on "Node 2" displays "DN" for node 1 (rebooted node) and "UN" for other 2 nodes. We also observed "Node 2" sending "Syn" message to "Node 1", but no "Ack" received from "Node 1" initially for 15min and then it is started receiving it. Please let us know the reason why "Node 2" is not receiving any "Ack" message from "Node 1" for 10 -15 minutes and how it suddenly able to receive "Ac" messages. Inter node communication port is 7000. 3 node cluster, and we have all 3 nodes as seed ips. Thanks and Regards A.SathishKumar On Fri, Dec 22, 2017 at 5:15 PM, sat <sathish.al...@gmail.com> wrote: > Hi, > > We checked and we were able to telnet to port 7000. > > Thanks and Regards > A.SathishKumar > > On Fri, Dec 22, 2017 at 3:43 PM, Nitan Kainth <nitankai...@gmail.com> > wrote: > >> Try telnet on your listen port. It must be network issue due to port or >> firewall issue. >> >> Sent from my iPhone >> >> On Dec 22, 2017, at 5:28 PM, sat <sathish.al...@gmail.com> wrote: >> >> >> >> Hi, >> >> We have 3 nodes in cluster, we rebooted one of the cassandra VM, we >> noticed nodetool status returning "UN" for itself and "DN" for other node, >> although we observe gossip sync and ack messages being shared between these >> nodes. >> >> *Issue in Detail* >> >> *Nodes in cluster* >> Node1 >> Node 2 >> Node 3 >> >> All above nodes formed cluster and nodetool status in all 3 machines were >> "UN" >> >> We rebooted Node 1 and restarted cassandra on node1, then we ran nodetool >> status and observed >> >> Node 1 - UN >> Node 2 - DN >> Node 3 - DN >> >> However when we run nodetool status on other 2 nodes (Node2, and Node 3) >> they claim all 3 nodes are "UN" >> >> We enabled "Trace" level and checked Gossip messages and noticed "SYN", >> "ACK" and "ACK2" initiated and received messages in Node 1 for other 2 >> nodes, but still nodetool status marks other 2 nodes as down. >> >> Please let us know how nodetool detects other nodes as "DOWN". Any help >> is highly appreciated. >> >> Thanks >> A.SathishKumar >> >> > > > -- > A.SathishKumar > 044-24735023 > -- A.SathishKumar 044-24735023
Re: Discrepancy in nodetool status
Hi, We checked and we were able to telnet to port 7000. Thanks and Regards A.SathishKumar On Fri, Dec 22, 2017 at 3:43 PM, Nitan Kainth <nitankai...@gmail.com> wrote: > Try telnet on your listen port. It must be network issue due to port or > firewall issue. > > Sent from my iPhone > > On Dec 22, 2017, at 5:28 PM, sat <sathish.al...@gmail.com> wrote: > > > > Hi, > > We have 3 nodes in cluster, we rebooted one of the cassandra VM, we > noticed nodetool status returning "UN" for itself and "DN" for other node, > although we observe gossip sync and ack messages being shared between these > nodes. > > *Issue in Detail* > > *Nodes in cluster* > Node1 > Node 2 > Node 3 > > All above nodes formed cluster and nodetool status in all 3 machines were > "UN" > > We rebooted Node 1 and restarted cassandra on node1, then we ran nodetool > status and observed > > Node 1 - UN > Node 2 - DN > Node 3 - DN > > However when we run nodetool status on other 2 nodes (Node2, and Node 3) > they claim all 3 nodes are "UN" > > We enabled "Trace" level and checked Gossip messages and noticed "SYN", > "ACK" and "ACK2" initiated and received messages in Node 1 for other 2 > nodes, but still nodetool status marks other 2 nodes as down. > > Please let us know how nodetool detects other nodes as "DOWN". Any help is > highly appreciated. > > Thanks > A.SathishKumar > > -- A.SathishKumar 044-24735023
Discrepancy in nodetool status
Hi, We have 3 nodes in cluster, we rebooted one of the cassandra VM, we noticed nodetool status returning "UN" for itself and "DN" for other node, although we observe gossip sync and ack messages being shared between these nodes. *Issue in Detail* *Nodes in cluster* Node1 Node 2 Node 3 All above nodes formed cluster and nodetool status in all 3 machines were "UN" We rebooted Node 1 and restarted cassandra on node1, then we ran nodetool status and observed Node 1 - UN Node 2 - DN Node 3 - DN However when we run nodetool status on other 2 nodes (Node2, and Node 3) they claim all 3 nodes are "UN" We enabled "Trace" level and checked Gossip messages and noticed "SYN", "ACK" and "ACK2" initiated and received messages in Node 1 for other 2 nodes, but still nodetool status marks other 2 nodes as down. Please let us know how nodetool detects other nodes as "DOWN". Any help is highly appreciated. Thanks A.SathishKumar
Re: Cassandra takes more time to join the cluster
Hi Jeff, Thanks for your prompt response. Please find logs. Does Gossip has dependency with NTP, because we find the node we reboot takes time to sync with NTP. When will FailureDetector message kick in, we have this issue only when we see Gossiper and Failure Detector log message. TRACE [GossipStage:1] 2017-12-22 12:46:26,759 FailureDetector.java:272 - Average for /10.63.114.158 is 9.977908204545455E8 TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:1140 - / 10.63.114.158 local generation 1513967155, remote generation 1513967155 TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:1193 - Updating heartbeat state version to 14666 from 14663 for /10.63.114.158 ... TRACE [GossipStage:1] 2017-12-22 12:46:26,759 Gossiper.java:986 - Sending a EchoMessage to /10.63.114.158 TRACE [GossipStage:1] 2017-12-22 12:46:26,759 MessagingService.java:760 - / 10.63.114.150 sending ECHO to 6794@/10.63.114.158 TRACE [GossipTasks:1] 2017-12-22 12:46:26,856 Gossiper.java:144 - My heartbeat is now 2258 TRACE [GossipTasks:1] 2017-12-22 12:46:26,856 Gossiper.java:500 - Gossip Digests are : /10.63.114.150:1513971247:2258 /10.63.114.154:1513967154:14664 /10.63.114.158:1513967155:14666 TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:646 - Sending a GossipDigestSyn to /10.63.114.158 ... TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 MessagingService.java:760 - / 10.63.114.150 sending GOSSIP_DIGEST_SYN to 6795@/10.63.114.158 TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:646 - Sending a GossipDigestSyn to /10.63.114.154 ... TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 MessagingService.java:760 - / 10.63.114.150 sending GOSSIP_DIGEST_SYN to 6796@/10.63.114.154 TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 Gossiper.java:757 - Performing status check ... TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:298 - PHI for /10.63.114.158 : 0.09877714244264273 TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:315 - PHI for /10.63.114.158 : 0.09877714244264273 TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:316 - mean for /10.63.114.158 : 9.977908204545455E8 TRACE [GossipTasks:1] 2017-12-22 12:46:26,857 FailureDetector.java:298 - PHI for /10.63.114.154 : 0.27857029580273446 TRACE [GossipTasks:1] 2017-12-22 12:46:26,858 FailureDetector.java:315 - PHI for /10.63.114.154 : 0.27857029580273446 TRACE [GossipTasks:1] 2017-12-22 12:46:26,858 FailureDetector.java:316 - mean for /10.63.114.154 : 7.65247444583815E8 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 GossipDigestAckVerbHandler.java:41 - Received a GossipDigestAckMessage from /10.63.114.158 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 GossipDigestAckVerbHandler.java:52 - Received ack with 1 digests and 0 states TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:889 - local heartbeat version 2258 greater than 2257 for /10.63.114.150 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 GossipDigestAckVerbHandler.java:84 - Sending a GossipDigestAck2Message to / 10.63.114.158 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 MessagingService.java:760 - / 10.63.114.150 sending GOSSIP_DIGEST_ACK2 to 6797@/10.63.114.158 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 GossipDigestAckVerbHandler.java:41 - Received a GossipDigestAckMessage from /10.63.114.154 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 GossipDigestAckVerbHandler.java:52 - Received ack with 2 digests and 1 states TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:1140 - / 10.63.114.154 local generation 1513967154, remote generation 1513967154 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:1193 - Updating heartbeat state version to 14664 from 14664 for /10.63.114.154 ... TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:986 - Sending a EchoMessage to /10.63.114.154 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 MessagingService.java:760 - / 10.63.114.150 sending ECHO to 6798@/10.63.114.154 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:889 - local heartbeat version 2258 greater than 2257 for /10.63.114.150 TRACE [GossipStage:1] 2017-12-22 12:46:26,858 Gossiper.java:889 - local heartbeat version 14666 greater than 14663 for /10.63.114.158 TRACE [GossipStage:1] 2017-12-22 12:46:26,859 Gossiper.java:904 - Adding state SEVERITY: 0.0 Thanks and Regards A.SathishKumar On Thu, Dec 21, 2017 at 9:40 PM, Jeff Jirsa <jji...@gmail.com> wrote: > Not received is ambiguous. > > Can you paste the logs somewhere like pastebin? Sanitize/anonymize as > needed. > > > > -- > Jeff Jirsa > > > On Dec 21, 2017, at 9:15 PM, sat <sathish.al...@gmail.com> wrote: > > Hi, > > We observe gossip is not received by other nodes and we see "Convicted" > message in debug.log even though cassandra process is running in all 3 > nodes. > > Thanks and Regards > A.SathisKumar > > On Thu, Dec 21, 2017 at 9:12 PM, Jeff Jirsa <jji...@gmail.com>
Re: Cassandra takes more time to join the cluster
Hi, We observe gossip is not received by other nodes and we see "Convicted" message in debug.log even though cassandra process is running in all 3 nodes. Thanks and Regards A.SathisKumar On Thu, Dec 21, 2017 at 9:12 PM, Jeff Jirsa <jji...@gmail.com> wrote: > What’s it logging? > > -- > Jeff Jirsa > > > On Dec 21, 2017, at 8:56 PM, sat <sathish.al...@gmail.com> wrote: > > We have configured 3 node cassandra cluster in RHEL 7.2 version and we are > doing cluster testing. When we start cassandra in all 3 nodes they form a > cluster and they work fine. > > But when we bring one node down using "init 6" or "reboot" command, the > rebooted node takes more time to join the cluster, however if we manually > kill and start cassandra process the nodes join cluster immediately without > any issues. > > We have provided all 3 IPs as seed nodes and the cluster name is same for > all 3 nodes and their respective IP as listen address. > > Selinux is also disabled. > > Please help us in resolving this issue. > > Thanks > > -- A.SathishKumar 044-24735023
Cassandra takes more time to join the cluster
We have configured 3 node cassandra cluster in RHEL 7.2 version and we are doing cluster testing. When we start cassandra in all 3 nodes they form a cluster and they work fine. But when we bring one node down using "init 6" or "reboot" command, the rebooted node takes more time to join the cluster, however if we manually kill and start cassandra process the nodes join cluster immediately without any issues. We have provided all 3 IPs as seed nodes and the cluster name is same for all 3 nodes and their respective IP as listen address. Selinux is also disabled. Please help us in resolving this issue. Thanks
Re: Priority for cassandra nodes in cluster
Hi, Thanks all for your valuable suggestion. Thanks and Regards A.SathishKumar On Sat, Nov 12, 2016 at 2:59 PM, Ben Bromhead <b...@instaclustr.com> wrote: > +1 w/ Benjamin. > > However if you wish to make use of spare hardware capacity, look to > something like mesos DC/OS or kubernetes. You can run multiple services > across a fleet of hardware, but provision equal resources to Cassandra and > have somewhat reliable hardware sharing mechanisms. > > On Sat, 12 Nov 2016 at 14:12 Jon Haddad <jonathan.had...@gmail.com> wrote: > >> Agreed w/ Benjamin. Trying to diagnose issues in prod will be a >> nightmare. Keep your DB servers homogeneous. >> >> On Nov 12, 2016, at 1:52 PM, Benjamin Roth <benjamin.r...@jaumo.com> >> wrote: >> >> 1. From a 15 year experience of running distributed Services: dont Mix >> Services on machines if you don't have to. Dedicate each server to a single >> task if you can afford it. It is easier to manage and reduces risks in case >> of overload or failure >> 2. You can assign a different number of tokens for each node by setting >> this in Cassandra.yaml before you bootstrap that node >> >> Am 12.11.2016 22:48 schrieb "sat" <sathish.al...@gmail.com>: >> >> Hi, >> >> We are planning to install 3 node cluster in production environment. Is >> it possible to provide weightage or priority to the nodes in cluster. >> >> Eg., We want more more records to be written to first 2 nodes and less to >> the 3rd node. We are thinking of this approach because we want to install >> other IO intensive messaging server in the 3rd node, in order to reduce the >> load we are requesting for this approach. >> >> >> Thanks and Regards >> A.SathishKumar >> >> >> -- > Ben Bromhead > CTO | Instaclustr <https://www.instaclustr.com/> > +1 650 284 9692 > Managed Cassandra / Spark on AWS, Azure and Softlayer > -- A.SathishKumar 044-24735023
Priority for cassandra nodes in cluster
Hi, We are planning to install 3 node cluster in production environment. Is it possible to provide weightage or priority to the nodes in cluster. Eg., We want more more records to be written to first 2 nodes and less to the 3rd node. We are thinking of this approach because we want to install other IO intensive messaging server in the 3rd node, in order to reduce the load we are requesting for this approach. Thanks and Regards A.SathishKumar
Re: ITrigger - Help
Hi Siddharth Verma, We explored this option, it seems it outputs the change only to a log file and we cannot get notified to a listener class. Could you please provide us what kind of information is pushed in the commit log and when we should read commitlog. Do we need to instantiate CommitLogReader.java and read it for every few seconds. Could you please provide us detailed example/tutorial of how to use this. Thanks and Regards A.SathishKumar On Fri, Nov 11, 2016 at 10:13 AM, siddharth verma < sidd.verma29.l...@gmail.com> wrote: > Hi Sathish, > You could look into, Change Data Capture (CDC) ( > https://issues.apache.org/jira/browse/CASSANDRA-8844 . > It might help you for some of your requirements. > > Regards > Siddharth Verma > > On Fri, Nov 11, 2016 at 11:34 PM, Jonathan Haddad <j...@jonhaddad.com> > wrote: > >> cqlsh uses the Python driver, I don't see how there would be any way to >> differentiate where the request came from unless you stuck an extra field >> in the table that you always write when you're not in cqlsh, or you >> modified cqlsh to include that field whenever it did an insert. >> >> Checking iTrigger source, all you get is a reference to the ColumnFamily >> and some metadata. At a glance of trunk, it doesn't look like you get the >> user that initiated the query. >> >> To be honest, I wouldn't do any of this, it feels like it's going to >> become an error prone mess. Your best bet is to layer something on top of >> the driver yourself. The cleanest way I think think of, long term, is to >> submit a JIRA / patch to enable some class loading & listener hooks in >> cqlsh itself. Without a patch and a really good use case I don't know who >> would want to maintain that though, as it would lock the team into using >> Python for cqlsh. >> >> Jon >> >> On Fri, Nov 11, 2016 at 9:52 AM sat <sathish.al...@gmail.com> wrote: >> >>> Hi, >>> >>> We are planning to use ITrigger to notify changes, when we execute >>> scripts or run commands in cqlsh prompt. If the operation is performed >>> through our application CRUD API, we are planning to handle notification in >>> our CRUD API itself, however if user performs some operation(like write >>> operation in cqlsh prompt) we want to handle those changes and update >>> modules that are listening to those changes. >>> >>> Could you please let us know whether it is possible to differentiate >>> updates done through cqlsh prompt and through application. >>> >>> We also thought about creating multiple users in cassandra and using >>> different user for cqlsh and for the application. If we go with this >>> approach, do we get the user who modified the table in ITrigger >>> implementation (ie., augment method) >>> >>> >>> Basically we are trying to limit/restrict usage of ITrigger just for >>> cqlsh prompt as it is little complex and risky (came to know it will impact >>> cassandra running in that node). >>> >>> Thanks and Regards >>> A.SathishKumar >>> >>> > > > -- > Siddharth Verma > (Visit https://github.com/siddv29/cfs for a high speed cassandra full > table scan) > -- A.SathishKumar 044-24735023
Re: ITrigger - Help
Hi Jon, Thanks for your prompt answer. Thanks A.SathishKumar On Fri, Nov 11, 2016 at 10:04 AM, Jonathan Haddad <j...@jonhaddad.com> wrote: > cqlsh uses the Python driver, I don't see how there would be any way to > differentiate where the request came from unless you stuck an extra field > in the table that you always write when you're not in cqlsh, or you > modified cqlsh to include that field whenever it did an insert. > > Checking iTrigger source, all you get is a reference to the ColumnFamily > and some metadata. At a glance of trunk, it doesn't look like you get the > user that initiated the query. > > To be honest, I wouldn't do any of this, it feels like it's going to > become an error prone mess. Your best bet is to layer something on top of > the driver yourself. The cleanest way I think think of, long term, is to > submit a JIRA / patch to enable some class loading & listener hooks in > cqlsh itself. Without a patch and a really good use case I don't know who > would want to maintain that though, as it would lock the team into using > Python for cqlsh. > > Jon > > On Fri, Nov 11, 2016 at 9:52 AM sat <sathish.al...@gmail.com> wrote: > >> Hi, >> >> We are planning to use ITrigger to notify changes, when we execute >> scripts or run commands in cqlsh prompt. If the operation is performed >> through our application CRUD API, we are planning to handle notification in >> our CRUD API itself, however if user performs some operation(like write >> operation in cqlsh prompt) we want to handle those changes and update >> modules that are listening to those changes. >> >> Could you please let us know whether it is possible to differentiate >> updates done through cqlsh prompt and through application. >> >> We also thought about creating multiple users in cassandra and using >> different user for cqlsh and for the application. If we go with this >> approach, do we get the user who modified the table in ITrigger >> implementation (ie., augment method) >> >> >> Basically we are trying to limit/restrict usage of ITrigger just for >> cqlsh prompt as it is little complex and risky (came to know it will impact >> cassandra running in that node). >> >> Thanks and Regards >> A.SathishKumar >> >> -- A.SathishKumar 044-24735023
ITrigger - Help
Hi, We are planning to use ITrigger to notify changes, when we execute scripts or run commands in cqlsh prompt. If the operation is performed through our application CRUD API, we are planning to handle notification in our CRUD API itself, however if user performs some operation(like write operation in cqlsh prompt) we want to handle those changes and update modules that are listening to those changes. Could you please let us know whether it is possible to differentiate updates done through cqlsh prompt and through application. We also thought about creating multiple users in cassandra and using different user for cqlsh and for the application. If we go with this approach, do we get the user who modified the table in ITrigger implementation (ie., augment method) Basically we are trying to limit/restrict usage of ITrigger just for cqlsh prompt as it is little complex and risky (came to know it will impact cassandra running in that node). Thanks and Regards A.SathishKumar
Re: Cassandra Triggers
Hi, We are doing POC on Cassandra for our business needs. We also need some kind of notification when a column/attribute is modified (insert/update/delete) of a table. Thanks for sharing information about CDC. Could you please point us to some example of how to implement this in Cassandra 3.9. Thanks and Regards A.SathishKumar On Wed, Nov 9, 2016 at 6:18 AM, DuyHai Doanwrote: > They are production ready in the sens that they are fully functional. But > using them require a *deep* knowledge of Cassandra Internal Write path and > is dangerous because the write path is critical. > > Alternatively if you need a notification system of new mutation, there is > a CDC feature, available since 3.9 only (maybe not production ready yet) > > On Wed, Nov 9, 2016 at 3:11 PM, Nethi, Manoj wrote: > >> Hi, >> >> Are Triggers in Cassandra production ready ? >> >> Version: Cassandra 3.3.0 >> >> >> >> Thanks >> >> Manoj >> >> > -- A.SathishKumar 044-24735023
Re: Designing a table in cassandra
Hi Carlos Alonso, Thanks for your quick answer. Thanks and Regards A.SathishKumar On Mon, Nov 7, 2016 at 2:26 AM, Carlos Alonso <i...@mrcalonso.com> wrote: > Hi, > > I think your best bet is, as usual, the simplest one that can work, which, > to me, in this case is the 3rd one. Creating one single device table that > contains the different 'versions' of the configuration during time, along > with a flag to know wether it was updated by user or by network gives you > all the flexibility you need. The primary key you suggest sounds good to me. > > To finally validate the model it would be good to know which are the > queries you're thinking of running against this model because as you > probably know, Cassandra models should be query driven. > > The suggested primary key will work for queries like "Give me the > version(s) of this particular device_name in this particular time range" > > Hope it helps. > > Regards > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 7 November 2016 at 01:23, sat <sathish.al...@gmail.com> wrote: > >> Hi, >> >> We are new to Cassandra. For our POC, we tried creating table and >> inserting them as JSON and all these went fine. Now we are trying to >> implement one of the application scenario, and I am having difficulty in >> coming up with the best approach. >> >> Scenario: >> We have a Device POJO which have some attributes/fields which are >> read/write by users as well as network and some attributes/fields only >> network can modify. When users need to configure they will create an >> instance of Device POJO and set/configure applicable fields, however >> network can update those attributes. We wanted to know the discrepancy by >> the values configured by users versus the values updated by network. Hence >> we have thought of 3 different approaches >> >> 1) Create multiple tables for the same Device like Device_Users and >> Device_Network so that we can see the difference. >> >> 2) Create different Keyspace as multiple objects like Device can have the >> same requirement >> >> 3) Create one "Device" table and insert one row for user configuration >> and another row for network update. We will create this table with multiple >> primary key (device_name, updated_by) >> >> Please let us know which is the best option (with their pros and cons if >> possible) among these 3, and also let us know if there are other options. >> >> Thanks and Regards >> A.SathishKumar >> > > -- A.SathishKumar 044-24735023
Designing a table in cassandra
Hi, We are new to Cassandra. For our POC, we tried creating table and inserting them as JSON and all these went fine. Now we are trying to implement one of the application scenario, and I am having difficulty in coming up with the best approach. Scenario: We have a Device POJO which have some attributes/fields which are read/write by users as well as network and some attributes/fields only network can modify. When users need to configure they will create an instance of Device POJO and set/configure applicable fields, however network can update those attributes. We wanted to know the discrepancy by the values configured by users versus the values updated by network. Hence we have thought of 3 different approaches 1) Create multiple tables for the same Device like Device_Users and Device_Network so that we can see the difference. 2) Create different Keyspace as multiple objects like Device can have the same requirement 3) Create one "Device" table and insert one row for user configuration and another row for network update. We will create this table with multiple primary key (device_name, updated_by) Please let us know which is the best option (with their pros and cons if possible) among these 3, and also let us know if there are other options. Thanks and Regards A.SathishKumar