And if you disable forking completely, do the tests pass for you always, or they also fail intermittently?
2014-06-26 15:59 GMT-07:00 Andrew Purtell <[email protected]>: > Additionally we run unit tests in parallel to reduce the total time > required for test suite execution. Surefire will fork multiple JVMs, > dynamically generate test jars containing a subset of tests, and run them. > That can make isolating hanging tests difficult but this behavior can be > influenced by defines on the Maven command line. For example, to fork a > process for every single unit test: > > mvn test -Dsurefire.firstPartForkMode=always > -Dsurefire.secondPartForkMode=always > > And then if you find a hanging surefire runner, you can dump thread stacks > of that JVM and know only the unit test you find methods of in the stacks > contributed to the current wedged state. > > > On Thu, Jun 26, 2014 at 3:48 PM, Andrew Purtell <[email protected]> > wrote: > > > Java 7u60 64-bit on an EC2 m3.4xlarge. Just running the unit test suite > in > > a loop. I don't set any special Maven options in MVN_OPTS or anything > like > > that. > > > > Historically failures that occur when the suite executes but do not when > > individual tests pass happen because one test does not shut down in a > > timely manner, or at all, and a subsequent test might use the same > > hardcoded path or port. When that happens we have a sporadic and > sometimes > > load sensitive failure. Complicating, each time one clones a repository > on > > a different host or file filesystem JUnit may pick up a different test > > order, influenced by whatever readdir hands back for each package. > > > > > > > > > > On Thu, Jun 26, 2014 at 3:25 PM, Mikhail Antonov <[email protected]> > > wrote: > > > >> Andrew, > >> > >> Could you share some details - on what env. you're running the tests, > and > >> at which point do that fail? I'm curious because of lately I'm seeing > >> weird > >> failures on current master too, which do not happen on hadoop-qa - > >> individual tests always pass, but when running the suite tests either > get > >> stuck and time out (in roughly the same point), or fail with NPE or > >> PermGen > >> exception. I've been blaming my environment first, but may be it's > >> something related. > >> > >> -Mikhail > >> > >> > >> > >> > >> 2014-06-26 13:39 GMT-07:00 Andrew Purtell <[email protected]>: > >> > >> > I'm finding that repeated runs of the unit test suite at the head of > >> branch > >> > 0.98 intermittently fail. Individual tests do not, so this likely a > >> lagging > >> > shutdown, port/resource conflict, and/or zombie test issue. I am > >> currently > >> > bisecting commits on 0.98 branch since the last release in the hope of > >> > pinning this down to a single change. Depending on how quickly that > can > >> > happen, the RC might happen on Monday or not. As things stand at the > >> head > >> > of the branch, I'd not +1 the RC given the release criteria I've been > >> using > >> > up to now. > >> > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > -- Thanks, Michael Antonov
