Stefan Miklosovic created CASSANDRA-15061:
---------------------------------------------

             Summary: Dtests: tests are failing on too powerful machines, 
setting more memory per node in dtests
                 Key: CASSANDRA-15061
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15061
             Project: Cassandra
          Issue Type: Improvement
          Components: Test/dtest
            Reporter: Stefan Miklosovic


While running dtests on 32 cores and 64 GB of memory on c5.9xlarge some tests 
are failing because they are not able to handle the stress cassandra-stress is 
generating for them.

For all examples, there is e.g. this one (1) where we test that a cluster is 
able to cope with a boostrapping node. The problem is that node1 is bombed with 
cassandra-stress and it is eventually killed and test fails as such before even 
proceeding to test itself.

It was said to me that dtests in circleci are running in containers with 8 
cores and 16GB or RAM and I simulated this on my machine 
(-Dcassandra.available_processors=8). The core problem is that nodes do not 
have enough memory - Xmx and Xms is set to only 512MB and that is very low 
figure and they are eventually killed.

Proposed solutions:

1) Run dtests on less powerful machines so it can not handle stress high enough 
so underlying nodes would be killed (rather strange idea)

2) Increase memory for node - this should be configurable, I saw that 1GB helps 
but there are still some timeouts, 2GB is better. 4GB would be the best.

3) Fix the test in such way it does not fail with 512MB.

 

1) is not viable to me, 3) takes a lot of time to go through and does not 
actually solve anything and it would be very cumbersome and clunky to go 
through all tests to set them like that. 2) seems to be the best approach but 
there is not any way I am aware of how to add more memory to every node all at 
once as node and cluster start / creation is scattered all over the project.

I have raised the issue here (2) too.

Do you guys think that if we manage to somehow fix this in CCM, we could 
introduce some switch / flag to dtests as how much memory a node in a cluster 
should run with?

(1) 
[https://github.com/apache/cassandra-dtest/blob/master/bootstrap_test.py#L419-L470]

(2) [https://github.com/riptano/ccm/issues/696]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to