Re: Lost counter updates during Cassandra upgrade 2.2.11 to 3.11.2

2018-11-09 Thread Laxmikant Upadhyay
Hi,

I have faced similar issue while upgrading from 2.1.16 -> 3.11.2 in a 3
node cluster.
I have raised jira ticket CASSANDRA-14881
 for this issue ,
but have not got any response on this yet.
@Konrad did you get any resolution on this ?

Regards,
Laxmikant




On Thu, Jul 26, 2018 at 5:34 PM Konrad  wrote:

> Hi,
>
> During rolling upgrade of our cluster we noticed that some updates on
> table with counters were not being applied. It looked as if it depended on
> whether coordinator handling request was already upgraded or not. I
> observed similar behavior while using cqlsh and executing queries manually.
> Sometimes it took several retries to see counter updated. There were no
> errors/warns in neither application nor Cassandra logs. The updates started
> working reliably once again when all nodes in dc have been upgraded.
> However, the lost updates did not reappear.
>
> Our setup:
> 2 dc cluster, 5 + 5 nodes. However, only one is used for queries as client
> application is co-located in one region. I believe 1 dc is enough to
> reproduce it.
> Replication factor 3+2
> Consistency level LOCAL_QUORUM
> Upgrading 2.2.11 to 3.11.2
>
> I haven't found any report of similar issue on the internet. Has anyone
> heard about such behavior?
>
> Thanks,
> Konrad
>
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: user-h...@cassandra.apache.org
>
>

-- 

regards,
Laxmikant Upadhyay


Re: Cluster configuration issue

2018-11-09 Thread Romain Hardouin
 128GB RAM -> that's a good news, you have plenty of room to increase Cassandra 
heap size. You can start with, let's say, 12GB in jvm.options or 24GB if you 
use G1 GC. Let us know if the node starts and if DEBUG/TRACE is useful. 
You can also try "strace -f -p ..." command to see if the process is doing 
something when it's stuck, but Cassandra has a lots of threads...
Le vendredi 9 novembre 2018 à 19:13:51 UTC+1, Francesco Messere 
 a écrit :  
 
  
Hi Roman 
 
 
yes  I modified the .yaml after the issue.
 
The problem  is this, if I restart a node in DC-FIRENZE than it not startup I 
tried first one node and then the second one
 
with the same results.
 

 
 
these are the server resources
 
memory 128Gb
 

 free
   total    used    free  shared  buff/cache   available
 Mem:  131741388    13858952    72649704  124584    45232732   116825040
 Swap:  16777212   0    16777212
 

 
 cpu 
 Architecture:  x86_64
 CPU op-mode(s):    32-bit, 64-bit
 Byte Order:    Little Endian
 CPU(s):    24
 On-line CPU(s) list:   0-23
 Thread(s) per core:    1
 Core(s) per socket:    12
 Socket(s): 2
 NUMA node(s):  2
 Vendor ID: GenuineIntel
 CPU family:    6
 Model: 79
 Model name:    Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
 Stepping:  1
 CPU MHz:   1213.523
 CPU max MHz:   2900.
 CPU min MHz:   1200.
 BogoMIPS:  4399.97
 Virtualization:    VT-x
 L1d cache: 32K
 L1i cache: 32K
 L2 cache:  256K
 L3 cache:  30720K
 NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
 NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23 

 
 
There is nothing in server logs. 
 
 
On monday I will activate debug and try again to startup cassandra node 
 
 
Thanks
 
Francesco Messere
 

 
 

 
 
 On 09/11/2018 18:51, Romain Hardouin wrote:
  
 
  Ok so all nodes in Firenze are down. I thought only one was KO.  
  After a first look at cassandra.yaml the only issue I saw is seeds: the line 
you commented out was correct (one seed per DC). But I guess you modified it 
after the issue.  
  You should fix the swap issue.  
  Also can you add more heap to Cassandra? By the way, what are the specs of 
servers (RAM, CPU, etc)?  
  Did you check Linux system log? And Cassandra's debug.log? You can even 
enable TRACE logs in logback.xml ( 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/logback.xml#L100 
) then try to restart a node in Firenze to see where it blocks but if it's due 
to low resources, hardware issue or swap it won't be useful. Let's give a try 
anyway. 
  
  
  Le vendredi 9 novembre 2018 à 18:20:57 UTC+1, Francesco Messere 
 a écrit :  
  
 
Hi Romain,
 
you are right, is not possible to work in these towns furtunally I live in Pisa 
:-).
 
I sow the errors and corrected them, except the swap one.
 
The process stuks, I let it run for 1 day without results.
 
 This is the output of nodetool status from the nodes that are up and running 
(DC-MILANO)
 
 /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 status
 Datacenter: DC-FIRENZE
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 DN  192.168.204.175  ?  256  100.0%    
a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
 DN  192.168.204.176  ?  256  100.0%    
67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
 Datacenter: DC-MILANO
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 UN  192.168.71.210   5.95 GiB   256  100.0%    
210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
 UN  192.168.71.211   5.83 GiB   256  100.0%    
96c30edd-4e6c-4952-82d4-dfdf67f6a06f  RACK1
 
 and this is describecluster command output
 
/conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 describecluster
 Cluster Information:
     Name: CASSANDRA_3
     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
     DynamicEndPointSnitch: enabled
     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
     Schema versions:
     6bdd4617-658e-375e-8503-7158df833495: [192.168.71.210, 
192.168.71.211]
 
     UNREACHABLE: [192.168.204.175, 192.168.204.176]
 
 In attach the cassandra.yaml file 
 
 Regards
 Francesco Messere
 
 
 
  On 09/11/2018 17:48, Romain Hardouin wrote:
  
 
Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs 
Calcio Storico X-D 
  Ok more seriously, "Updating topology ..." is not a problem. But you have low 
resources 

Re: Cluster configuration issue

2018-11-09 Thread Francesco Messere

  
  
Hi Roman 
  
yes  I modified the .yaml after the issue.
The problem  is this, if I restart a node in DC-FIRENZE than
it not startup I tried first one node and then the second one
with the same results.

  
these are the server resources
memory 128Gb

  free
    total    used    free  shared 
  buff/cache   available
  Mem:  131741388    13858952    72649704  124584   
  45232732   116825040
  Swap:  16777212   0    16777212


cpu 
  Architecture:  x86_64
  CPU op-mode(s):    32-bit, 64-bit
  Byte Order:    Little Endian
  CPU(s):    24
  On-line CPU(s) list:   0-23
  Thread(s) per core:    1
  Core(s) per socket:    12
  Socket(s): 2
  NUMA node(s):  2
  Vendor ID: GenuineIntel
  CPU family:    6
  Model: 79
  Model name:    Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
  Stepping:  1
  CPU MHz:   1213.523
  CPU max MHz:   2900.
  CPU min MHz:   1200.
  BogoMIPS:  4399.97
  Virtualization:    VT-x
  L1d cache: 32K
  L1i cache: 32K
  L2 cache:  256K
  L3 cache:  30720K
  NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22
  NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23

  
There is nothing in server logs. 
  
On monday I will activate debug and try again to startup cassandra
node 
  
Thanks
Francesco Messere

  

  

On 09/11/2018 18:51, Romain Hardouin
  wrote:


  
  

Ok so all nodes in Firenze are down. I thought only one
  was KO. 


After a first look at cassandra.yaml the only issue I
  saw is seeds: the line you commented out was correct (one
  seed per DC). But I guess you modified it after the
  issue. 


You should fix the swap issue. 


Also can you add more heap to Cassandra? By the way,
  what are the specs of servers (RAM, CPU, etc)? 


Did you check Linux system log? And Cassandra's
  debug.log?
You can even enable TRACE logs in logback.xml ( https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/logback.xml#L100 )
  then try to restart a node in Firenze to see where it
  blocks but if it's due to low resources, hardware issue or
  swap it won't be useful. Let's give a try anyway.


  



  
  

   Le vendredi 9 novembre 2018 à 18:20:57 UTC+1, Francesco
Messere  a écrit : 
  
  
  
  
  

  
Hi Romain,
you are right, is not possible to work in these
towns furtunally I live in Pisa :-).
I sow the errors and corrected them, except the
swap one.
The process stuks, I let it run for 1 day without
results.

This is the output of nodetool status from the nodes
  that are up and running (DC-MILANO)
  
/conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool
  -h 192.168.71.210 -p 17052 status
Datacenter: DC-FIRENZE
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  
  Owns (effective)  Host
  ID   Rack
DN  192.168.204.175  ?  256 
  100.0%   
  a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
DN  192.168.204.176  ?  256 
  100.0%   
  67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
Datacenter: DC-MILANO
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens  
  Owns (effective)  Host
  ID   Rack
UN  192.168.71.210   5.95 GiB   256 
  100.0%   
  210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
UN  192.168.71.211   5.83 GiB   256 
  100.0%   

Re: Cluster configuration issue

2018-11-09 Thread Romain Hardouin
 Ok so all nodes in Firenze are down. I thought only one was KO. 
After a first look at cassandra.yaml the only issue I saw is seeds: the line 
you commented out was correct (one seed per DC). But I guess you modified it 
after the issue. 
You should fix the swap issue. 
Also can you add more heap to Cassandra? By the way, what are the specs of 
servers (RAM, CPU, etc)? 
Did you check Linux system log? And Cassandra's debug.log?You can even enable 
TRACE logs in logback.xml ( 
https://github.com/apache/cassandra/blob/cassandra-3.11.3/conf/logback.xml#L100 
) then try to restart a node in Firenze to see where it blocks but if it's due 
to low resources, hardware issue or swap it won't be useful. Let's give a try 
anyway.


Le vendredi 9 novembre 2018 à 18:20:57 UTC+1, Francesco Messere 
 a écrit :  
 
  
Hi Romain,
 
you are right, is not possible to work in these towns furtunally I live in Pisa 
:-).
 
I sow the errors and corrected them, except the swap one.
 
The process stuks, I let it run for 1 day without results.
 
 This is the output of nodetool status from the nodes that are up and running 
(DC-MILANO)
 
 /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 status
 Datacenter: DC-FIRENZE
 ==
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 DN  192.168.204.175  ?  256  100.0%    
a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
 DN  192.168.204.176  ?  256  100.0%    
67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
 Datacenter: DC-MILANO
 =
 Status=Up/Down
 |/ State=Normal/Leaving/Joining/Moving
 --  Address  Load   Tokens   Owns (effective)  Host ID 
  Rack
 UN  192.168.71.210   5.95 GiB   256  100.0%    
210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
 UN  192.168.71.211   5.83 GiB   256  100.0%    
96c30edd-4e6c-4952-82d4-dfdf67f6a06f  RACK1
 
 and this is describecluster command output
 
 /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool -h 
192.168.71.210 -p 17052 describecluster
 Cluster Information:
     Name: CASSANDRA_3
     Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch
     DynamicEndPointSnitch: enabled
     Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
     Schema versions:
     6bdd4617-658e-375e-8503-7158df833495: [192.168.71.210, 
192.168.71.211]
 
     UNREACHABLE: [192.168.204.175, 192.168.204.176]
 
 In attach the cassandra.yaml file 
 
 Regards
 Francesco Messere
 
 
 
 On 09/11/2018 17:48, Romain Hardouin wrote:
  
 
  Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs Calcio 
Storico X-D 
  Ok more seriously, "Updating topology ..." is not a problem. But you have low 
resources and system misconfiguration: 
    - Small heap size: 3.867GiB  From the logs: "Unable to lock JVM memory 
(ENOMEM). This can result in part of the JVM being swapped out, especially with 
mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root." 
   - System settings: Swap shoud be disabled, bad system limits, etc.  From the 
logs: "Cassandra server running in degraded mode. Is swap disabled? : false,  
Address space adequate? : true,  nofile limit adequate? : true, nproc limit 
adequate? : false"For system tuning see 
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html 
  
  You said "Cassandra node did not startup". What is the problem exactly? The 
process is stuck or does it dies? What do you see with "nodetool status" on 
nodes that are up and running?  
  Btw cassandra-topology.properties is not required with 
GossipingPropertyFileSnitch (unless your are migratig from PropertyFileSnitch). 
  
  Best, 
  Romain 
  
  Le vendredi 9 novembre 2018 à 11:34:16 UTC+1, Francesco Messere 
 a écrit :  
  
 
Hi to all, 
 
 I have a problem with distribuited cluster configuration.
 This is a test environment  
 Cassandra version is 3.11.3
 2 site Milan and Florence
 2 servers on each site 
 
 1 common "cluster-name" and 2 DC 
 
 First installation and startup goes ok all the nodes are present in the 
cluster. 
 
 The issue startup after a server reboot in FLORENCE DC 
 
 Cassandra node did not startup and in system.log last line written is 
 
 INFO  [ScheduledTasks:1] 2018-11-09 10:36:54,306 TokenMetadata.java:498 - 
Updating topology for all endpoints that have changed
 
 
 
 The only way to correct the thing I found is to cleanup the node, remove from 
cluster and re-join it.
  
How can I solve it?
 

 
 
here are configuration files 
 
 
less cassandra-topology.properties 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 

Re: Cluster configuration issue

2018-11-09 Thread Francesco Messere

  
  
Hi Romain,
you are right, is not possible to work in these towns
furtunally I live in Pisa :-).
I sow the errors and corrected them, except the swap one.
The process stuks, I let it run for 1 day without results.

This is the output of nodetool status from the nodes that are up
  and running (DC-MILANO)
  
/conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool
  -h 192.168.71.210 -p 17052 status
Datacenter: DC-FIRENZE
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns
  (effective)  Host ID   Rack
DN  192.168.204.175  ?  256 
  100.0%    a3c8626e-afab-413e-a153-cccfd0b26d06  RACK1
DN  192.168.204.176  ?  256 
  100.0%    67738ca8-f1f5-46a9-9d23-490bbebcffaa  RACK1
Datacenter: DC-MILANO
=
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address  Load   Tokens   Owns
  (effective)  Host ID   Rack
UN  192.168.71.210   5.95 GiB   256 
  100.0%    210f0cdd-abee-4fc0-abd3-ecdab618606e  RACK1
UN  192.168.71.211   5.83 GiB   256 
  100.0%    96c30edd-4e6c-4952-82d4-dfdf67f6a06f  RACK1
  
  and this is describecluster command output
  
  /conf/CASSANDRA_SHARE_PROD_conf/bin/cassandra-3.11.3/bin/nodetool
  -h 192.168.71.210 -p 17052 describecluster
  Cluster Information:
      Name: CASSANDRA_3
      Snitch:
  org.apache.cassandra.locator.GossipingPropertyFileSnitch
      DynamicEndPointSnitch: enabled
      Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
      Schema versions:
      6bdd4617-658e-375e-8503-7158df833495:
  [192.168.71.210, 192.168.71.211]
  
      UNREACHABLE: [192.168.204.175, 192.168.204.176]
  
  In attach the cassandra.yaml file 
  
  Regards
  Francesco Messere
  


On 09/11/2018 17:48, Romain Hardouin
  wrote:


  
  

Hi Francesco, it can't work! Milano and Firenze, oh
  boy, Calcio vs Calcio Storico X-D


Ok more seriously, "Updating topology ..." is not a
  problem. But you have low resources and system
  misconfiguration:


  - Small heap size: 3.867GiB
	From the
  logs: "Unable to lock JVM memory (ENOMEM). This can result
  in part of the JVM being swapped out, especially with
  mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run
  Cassandra as root."


 - System settings: Swap shoud be disabled, bad system
  limits, etc.
	From the
  logs: "Cassandra server running in degraded mode. Is swap
  disabled? : false,  Address space adequate? : true, 
  nofile limit adequate? : true, nproc limit adequate? :
  false"
	


For system tuning see
  https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html




You said "Cassandra node did not startup". What is the
  problem exactly? The process is stuck or does it dies?
What do you see with "nodetool status" on nodes that
  are up and running? 


Btw cassandra-topology.properties is not
required with GossipingPropertyFileSnitch (unless your
are migratig from PropertyFileSnitch).




Best,


Romain
  



  
  

   Le vendredi 9 novembre 2018 à 11:34:16 UTC+1, Francesco
Messere  a écrit : 
  
  
  
  
  

  
Hi to all, 
  
I have a problem with distribuited cluster
  configuration.
This is a test environment  
  Cassandra version is 3.11.3
2 site Milan and Florence
2 servers on each site 

1 common "cluster-name" and 2 DC 

First installation and startup goes ok all the nodes
  are present in the cluster. 

The issue startup after a server reboot in FLORENCE
  DC 

Cassandra node did not startup and in system.log
  last line written is 


Re: Cluster configuration issue

2018-11-09 Thread Romain Hardouin
 Hi Francesco, it can't work! Milano and Firenze, oh boy, Calcio vs Calcio 
Storico X-D
Ok more seriously, "Updating topology ..." is not a problem. But you have low 
resources and system misconfiguration:
  - Small heap size: 3.867GiB From the logs: "Unable to lock JVM memory 
(ENOMEM). This can result in part of the JVM being swapped out, especially with 
mmapped I/O enabled. Increase RLIMIT_MEMLOCK or run Cassandra as root."
 - System settings: Swap shoud be disabled, bad system limits, etc. From the 
logs: "Cassandra server running in degraded mode. Is swap disabled? : false,  
Address space adequate? : true,  nofile limit adequate? : true, nproc limit 
adequate? : false" 
For system tuning see 
https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

You said "Cassandra node did not startup". What is the problem exactly? The 
process is stuck or does it dies?What do you see with "nodetool status" on 
nodes that are up and running? 
Btw cassandra-topology.properties is not required with 
GossipingPropertyFileSnitch (unless your are migratig from PropertyFileSnitch).

Best,
Romain

Le vendredi 9 novembre 2018 à 11:34:16 UTC+1, Francesco Messere 
 a écrit :  
 
   
Hi to all, 
 
 I have a problem with distribuited cluster configuration.
 This is a test environment  
 Cassandra version is 3.11.3
 2 site Milan and Florence
 2 servers on each site 
 
 1 common "cluster-name" and 2 DC 
 
 First installation and startup goes ok all the nodes are present in the 
cluster. 
 
 The issue startup after a server reboot in FLORENCE DC 
 
 Cassandra node did not startup and in system.log last line written is 
 
 INFO  [ScheduledTasks:1] 2018-11-09 10:36:54,306 TokenMetadata.java:498 - 
Updating topology for all endpoints that have changed
 
 
 
 The only way to correct the thing I found is to cleanup the node, remove from 
cluster and re-join it.
  
How can I solve it?
 

 
 
here are configuration files 
 
 
less cassandra-topology.properties 
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
 # Cassandra Node IP=Data Center:Rack
 192.168.204.175=DC-FIRENZE:RACK1
 192.168.204.176=DC-FIRENZE:RACK1
 192.168.71.210=DC-MILANO:RACK1
 192.168.71.211=DC-MILANO:RACK1
 
 # default for unknown nodes
 default=DC-FIRENZE:r1
 
 # Native IPv6 is supported, however you must escape the colon in the IPv6 
Address
 # Also be sure to comment out JVM_OPTS="$JVM_OPTS 
-Djava.net.preferIPv4Stack=true"
 # in cassandra-env.sh
 #fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3
 

 
 
cassandra-rackdc.properties
 
 
# Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
 # These properties are used with GossipingPropertyFileSnitch and will
 # indicate the rack and dc for this node
 dc=DC-FIRENZE
 rack=RACK1
 
 # Add a suffix to a datacenter name. Used by the Ec2Snitch and 
Ec2MultiRegionSnitch
 # to append a string to the EC2 region name.
 #dc_suffix=
 
 # Uncomment the following line to make this snitch prefer the internal ip when 
possible, as the Ec2MultiRegionSnitch does.
 # prefer_local=true
 
 

 
 
In attach the system.log file 
 
 
Regards 
 
 
Francesco Messere
 

 
 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org  

Cluster configuration issue

2018-11-09 Thread Francesco Messere

  
  
Hi to all, 
  
I have a problem with distribuited cluster configuration.
This is a test environment  
  Cassandra version is 3.11.3
2 site Milan and Florence
2 servers on each site 

1 common "cluster-name" and 2 DC 

First installation and startup goes ok all the nodes are present
  in the cluster. 

The issue startup after a server reboot in FLORENCE DC 

Cassandra node did not startup and in system.log last line
  written is 

INFO  [ScheduledTasks:1] 2018-11-09 10:36:54,306
  TokenMetadata.java:498 - Updating topology for all endpoints
  that have changed
  
  

  The only way to correct the thing I found is to cleanup the node,
  remove from cluster and re-join it.

How can I solve it?

  
here are configuration files 
  
less cassandra-topology.properties 
# Unless required by applicable law or agreed to in writing,
software
# distributed under the License is distributed on an "AS IS"
BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or implied.
# See the License for the specific language governing
permissions and
# limitations under the License.

# Cassandra Node IP=Data Center:Rack
192.168.204.175=DC-FIRENZE:RACK1
192.168.204.176=DC-FIRENZE:RACK1
192.168.71.210=DC-MILANO:RACK1
192.168.71.211=DC-MILANO:RACK1

# default for unknown nodes
default=DC-FIRENZE:r1

# Native IPv6 is supported, however you must escape the colon in
the IPv6 Address
# Also be sure to comment out JVM_OPTS="$JVM_OPTS
-Djava.net.preferIPv4Stack=true"
# in cassandra-env.sh
#fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3

  
cassandra-rackdc.properties
  
# Unless required by applicable law or agreed to in writing,
software
# distributed under the License is distributed on an "AS IS"
BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
or implied.
# See the License for the specific language governing
permissions and
# limitations under the License.

# These properties are used with GossipingPropertyFileSnitch and
will
# indicate the rack and dc for this node
dc=DC-FIRENZE
rack=RACK1

# Add a suffix to a datacenter name. Used by the Ec2Snitch and
Ec2MultiRegionSnitch
# to append a string to the EC2 region name.
#dc_suffix=

# Uncomment the following line to make this snitch prefer the
internal ip when possible, as the Ec2MultiRegionSnitch does.
# prefer_local=true
  

  
In attach the system.log file 
  
Regards 
  
Francesco Messere

  
  

INFO  [main] 2018-11-09 10:37:24,378 YamlConfigurationLoader.java:89 - 
Configuration location: 
file:/data/CASSANDRA_SHARE_PROD_data/CASSANDRA_PRIV_3/conf/cassandra.yaml
INFO  [main] 2018-11-09 10:37:24,621 Config.java:495 - Node 
configuration:[allocate_tokens_for_keyspace=null; 
authenticator=AllowAllAuthenticator; authorizer=AllowAllAuthorizer; 
auto_bootstrap=true; auto_snapshot=true; back_pressure_enabled=false; 
back_pressure_strategy=org.apache.cassandra.net.RateBasedBackPressure{high_ratio=0.9,
 factor=5, flow=FAST}; batch_size_fail_threshold_in_kb=50; 
batch_size_warn_threshold_in_kb=5; batchlog_replay_throttle_in_kb=1024; 
broadcast_address=192.168.204.176; broadcast_rpc_address=null; 
buffer_pool_use_heap_if_exhausted=true; cas_contention_timeout_in_ms=1000; 
cdc_enabled=false; cdc_free_space_check_interval_ms=250; 
cdc_raw_directory=/work/cassandra/cassandra_priv_3/cdc_raw; 
cdc_total_space_in_mb=4096; client_encryption_options=; 
cluster_name=CASSANDRA_3; column_index_cache_size_in_kb=2; 
column_index_size_in_kb=64; commit_failure_policy=stop; 
commitlog_compression=null; 
commitlog_directory=/commitlog/cassandra/cassandra_priv_3/commitlog; 
commitlog_max_compression_buffers_in_pool=3; commitlog_periodic_queue_size=-1; 
commitlog_segment_size_in_mb=32; commitlog_sync=periodic; 
commitlog_sync_batch_window_in_ms=NaN; commitlog_sync_period_in_ms=1; 
commitlog_total_space_in_mb=8192; 
compaction_large_partition_warning_threshold_mb=100; 
compaction_throughput_mb_per_sec=16; concurrent_compactors=null; 
concurrent_counter_writes=32; concurrent_materialized_view_writes=32; 
concurrent_reads=32; concurrent_replicates=null; concurrent_writes=32; 
counter_cache_keys_to_save=2147483647; counter_cache_save_period=7200; 
counter_cache_size_in_mb=null; counter_write_request_timeout_in_ms=5000; 
credentials_cache_max_entries=1000; credentials_update_interval_in_ms=-1; 
credentials_validity_in_ms=2000; cross_node_timeout=false; 

Re: Jepsen testing

2018-11-09 Thread Oleksandr Shulgin
On Thu, Nov 8, 2018 at 10:42 PM Yuji Ito  wrote:

>
> We are working on Jepsen testing for Cassandra.
> https://github.com/scalar-labs/jepsen/tree/cassandra/cassandra
>
> As you may know, Jepsen is a framework for distributed systems
> verification.
> It can inject network failure and so on and check data consistency.
> https://github.com/jepsen-io/jepsen
>
> Our tests are based on riptano's great work.
> https://github.com/riptano/jepsen/tree/cassandra/cassandra
>
> I refined it for the latest Jepsen and removed some tests.
> Next, I'll fix clock-drift tests.
>
> I would like to get your feedback.
>

Cool stuff!  Do you have jepsen tests as part of regular testing in
scalardb?  How long does it take to run all of them on average?

I wonder if Apache Cassandra would be willing to include this as part of
regular testing drill as well.

Cheers,
--
Alex