I have had this problem too on similarly spec'd OpenStack instances (but I'm reasonably certain I didn't include the resource-intensive tests). My solution was to run the dtests in small batches (say 5-10 each), with a timeout (say 1.2x the max for 5 tests from a good run). Kill the test if exceeds the timeout, that way you lose only 5-10 test results.
Another thing I did was to run the tests in docker, killing the instance itself on timeout and ensuring no orphan processes. This also allows you to run two or more test sets in parallel (which is not otherwise possible due to the use of CCM). On Thu, Oct 19, 2017, 22:58 Michael Shuler <mich...@pbandjelly.org> wrote: > 7.6G RAM may be a little bit too small, we've seen similar random hangs > in the past on non-resource-intensive tests on m3.large. It doesn't > appear you are skipping resource-intensive tests. Our standard dtest > instance type has been an m3.xlarge, and the resource-intensive tests > are run m3.2xlarge. (or something comparable on RAM & SSD) > > dtest run command (excludes resource-intensive tests): > > https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L53 > > dtest-large run command (only resource-intensive tests): > > https://github.com/apache/cassandra-builds/blob/master/build-scripts/cassandra-dtest.sh#L61 > > Running dtest without excluding resource-intensive will run everything. > > If you wish to troubleshoot your tests when they hang, there should be > /tmp/dtest-XXXXX directories with the ccm cluster left on disk from the > hung test, since they never get to cleanup stage. > > -- > Michael > > On 10/19/2017 05:29 AM, Sergey La wrote: > > Hi! > > > > I have created the patch for the Cassandra version 3.0.14 and trying to > > test it using the cassandra dtests. > > > > Problem is - dtests deadlocks at some random tests, time and again - on > > unpatched 3.0.14 version of Cassandra. > > > > What I have done. > > > > I cloned the cassandra repository (origin > > http://git-wip-us.apache.org/repos/asf/cassandra.git), and checked out > to > > tags/cassandra-3.0.14 - head is on > f3e38cb638113c2a23855a104d6082da5bc10ddb. > > > > Then I have cloned the cassandra-dtest repo (origin git:// > > github.com/riptano/cassandra-dtest.git). Head is on > > 6843d76d0a85ad82edf889e8280b87786dc48486. > > > > I setup dtests according to this instructions: > > https://github.com/riptano/cassandra-dtest/blob/master/INSTALL.md > > > > In addition, I have setup JAVA8_HOME and JAVA_HOME variables to public > jre > > of my 1.8.0_144 jdk. > > > > I start testing using this command: > > JAVA8_HOME=$JAVA8_HOME nosetests --with-flaky --with-xunit > > --xunit-file=out.xunit.xml --force-flaky --max-runs=3 --verbose > > --debug-log=err.debug.nose.txt 1> out.txt 2> err.txt > > I run the tests on x86_64 CentOS 7 with 7.6G of RAM. > > > > Problem symptoms: > > During the "normal" run (I have got only 1 "normal" run in 5 attempts), > > err.txt is updated constantly with name of the test recently completed, > and > > in the end out.xunit.xml file appears, with test summary results. > nosetests > > process exits. > > > > During the "problem" run tests stop progressing (err.txt was not modified > > for 10 hours), out.xunit.xml is not appearing, nosetests process runs. I > > killed java processes, but nothing changed for 2 hours - nosetests > process > > still runs, but files are unchanged. > > > > Any help would be appreciated, > > Sergey > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > > -- Murukesh Mohanan, Yahoo! Japan