Hi Jimmy, On Thu, Aug 30, 2012 at 8:50 PM, Jimmy Xiang <[email protected]> wrote:
> Is this a test case issue, or an issue with surefire? > There are issues with surefire, but in this case it's 90% us. Some of those red jenkins builds are because some tests are killed > as findHangingTest.sh shows them hanging. > Yeah, surefire should have killed them, but didn't. There is a jira in surefire for this, it's not a totally trivial fix. > > I was wondering before showing the build red, can we find those tests, then > run them not in parallel one more time? This could be done in a jenkin > build script, right? > Yes it could (it used to exist actually, it's in dev-support). Or even in surefire, I think I've seen a jira for this. But I don't think it's a good direction for us: - issues with parallelisation comes from fixed ports & so on. But on a dev machine, there are many reasons to have a port taken: you're running a local cluster, you have whatever software running who took it by accident, and so on. Tests should run on any reasonable environment. - Parallelization shows issue because it shakes the machine, but most of the time a test that fails under parallelization will fail if you try a few times. - Test flakiness can actually be HBase flakiness, see for example HBASE-5569. Or misunderstanding of important stuff as in HBASE-6175. So I would personally recommend the hard way, i.e. fixing the flaky tests. The situation is also much better now than a year ago. I year ago it was impossible for me to get a full run of tests without errors. Now it happens (sometimes). Cheers, N.
