[
https://issues.apache.org/jira/browse/HADOOP-9112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583115#comment-13583115
]
nkeywal commented on HADOOP-9112:
---------------------------------
An issue I have with timeouts is that we have to change them during debugging
-may be there is an option I don't knwow-.
Anyway, a test process can fail in the afterSuite (basically, when you're
shutting down the cluster). And surefire may not kill it, and you won't know,
and you will find it at the next build.
In HBase, we do that before running the tests:
### kill any process remaining from another test, maybe even another project
jps | grep surefirebooter | cut -d ' ' -f 1 | xargs kill -9 2>/dev/null
And this after
ZOMBIE_TESTS_COUNT=`jps | grep surefirebooter | wc -l`
if [[ $ZOMBIE_TESTS_COUNT != 0 ]] ; then
#It seems sometimes the tests are not dying immediately. Let's give them 30s
echo "Suspicious java process found - waiting 30s to see if there are just
slow to stop"
sleep 30
ZOMBIE_TESTS_COUNT=`jps | grep surefirebooter | wc -l`
if [[ $ZOMBIE_TESTS_COUNT != 0 ]] ; then
echo "There are $ZOMBIE_TESTS_COUNT zombie tests, they should have been
killed by surefire but survived"
echo "************ BEGIN zombies jstack extract"
ZB_STACK=`jps | grep surefirebooter | cut -d ' ' -f 1 | xargs -n 1 jstack
| grep ".test" | grep "\.java"`
jps | grep surefirebooter | cut -d ' ' -f 1 | xargs -n 1 jstack
echo "************ END zombies jstack extract"
JIRA_COMMENT="$JIRA_COMMENT
{color:red}-1 core zombie tests{color}. There are ${ZOMBIE_TESTS_COUNT}
zombie test(s): ${ZB_STACK}"
BAD=1
jps | grep surefirebooter | cut -d ' ' -f 1 | xargs kill -9
else
echo "We're ok: there is no zombie test, but some tests took some time to
stop"
fi
else
echo "We're ok: there is no zombie test"
fi
See http://www.mail-archive.com/[email protected]/msg73169.html for the
outcome (it's actually a hdfs zombie, this was before we started killing the
zombies at the beginning of our tests). The whole stack is in the build logs.
It has improved the precommit success ratio.
It was my two cents :-)
> test-patch should -1 for @Tests without a timeout
> -------------------------------------------------
>
> Key: HADOOP-9112
> URL: https://issues.apache.org/jira/browse/HADOOP-9112
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Todd Lipcon
> Assignee: Surenkumar Nihalani
> Fix For: 3.0.0
>
> Attachments: HADOOP-9112-1.patch, HADOOP-9112-2.patch,
> HADOOP-9112-3.patch, HADOOP-9112-4.patch, HADOOP-9112-5.patch,
> HADOOP-9112-6.patch, HADOOP-9112-7.patch
>
>
> With our current test running infrastructure, if a test with no timeout set
> runs too long, it triggers a surefire-wide timeout, which for some reason
> doesn't show up as a failed test in the test-patch output. Given that, we
> should require that all tests have a timeout set, and have test-patch enforce
> this with a simple check
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira