[
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004559#comment-15004559
]
Jim Witschey commented on CASSANDRA-10659:
------------------------------------------
I've made good progress on a plugin and nose-wrapping script to fail tests if
they produce no output for some length of time:
https://github.com/mambocab/nose_call_on_hang
https://gist.github.com/mambocab/760928e01a5e1ee5489f
I believe these are just about ready to use for the main CassCI jobs, though
some changes to the dtests may still be necessary handle exceptions correctly.
I've fixed some exception handling problems here:
https://github.com/riptano/cassandra-dtest/pull/660
https://github.com/riptano/cassandra-dtest/pull/657
> Windows CassCI: Fail on timed-out tests
> ---------------------------------------
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jim Witschey
> Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way
> to make them fail, rather than timing out Jenkins and botching the rest of
> the test run.
> The built-in [{{nosetests}} {{multiprocess}}
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would
> solve this problem for us -- we could run the tests with {{nosetests
> --processes=1 --process-timeout=X}} and it would stop the test and fail if
> the test took too long. However, it's broken on Windows. I've filed [a quick
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966],
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for
> each test and kill that process if it took too long. If I understand
> correctly, that script is broken, or assumes things that are no longer true.
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused
> just when a tests runs long, but when Jenkins doesn't get any output on
> stdout from a hanging test. We may be able to monitor stdout from a second
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)