[
https://issues.apache.org/jira/browse/CASSANDRA-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14996922#comment-14996922
]
Jim Witschey commented on CASSANDRA-10659:
------------------------------------------
Having slept on this, I think #2 is only worth it for us as a fallback -- some
tests run longer than 30 minutes, and this is correct behavior. The
{{multiprocess}} nose plugin can't be nuanced about this and, even when if
fixed to work correctly on Windows, will make those tests fail. We need to
detect periods of inactivity, not long-running tests.
> Windows CassCI: Fail on timed-out tests
> ---------------------------------------
>
> Key: CASSANDRA-10659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10659
> Project: Cassandra
> Issue Type: Bug
> Reporter: Jim Witschey
> Assignee: Jim Witschey
>
> On our Windows CassCI environments, it looks like some dtests are prone to
> hanging, e.g.:
> https://cassci.datastax.com/view/Dev/view/josh-mckenzie/job/josh-mckenzie-10641_windows-dtest_win32/1/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/131/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/129/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/128/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/126/
> http://cassci.datastax.com/view/cassandra-2.2/job/cassandra-2.2_dtest_win32/125/
> Ideally these tests wouldn't hang, but regardless, we should figure out a way
> to make them fail, rather than timing out Jenkins and botching the rest of
> the test run.
> The built-in [{{nosetests}} {{multiprocess}}
> plugin|http://nose.readthedocs.org/en/latest/plugins/multiprocess.html] would
> solve this problem for us -- we could run the tests with {{nosetests
> --processes=1 --process-timeout=X}} and it would stop the test and fail if
> the test took too long. However, it's broken on Windows. I've filed [a quick
> issue on the {{nose}} GitHub|https://github.com/nose-devs/nose/issues/966],
> but in the meantime, we should figure out how to avoid this.
> Possible solutions:
> # [~philipthompson] had a script that would shell out to {{nosetests}} for
> each test and kill that process if it took too long. If I understand
> correctly, that script is broken, or assumes things that are no longer true.
> We can revamp it if we want.
> # We could make a patch for {{nose}} to fix the {{multiprocess}} plugin.
> # We could hack in some of {{multiprocessing}}'s functionality into the
> {{dtest}} suite itself.
> 3. may be the best workaround for this problem -- our timeouts aren't caused
> just when a tests runs long, but when Jenkins doesn't get any output on
> stdout from a hanging test. We may be able to monitor stdout from a second
> process and fail the test before Jenkins would time out.
> Pinging [~JoshuaMcKenzie] as this is a Windows issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)