Re: Planning to roll the 0.98.4 RC on 6/30

Andrew Purtell Thu, 26 Jun 2014 17:05:05 -0700

I haven't tried yet here, but on other occasions I've found the slow or
hanging tests I was after, yeah.


I haven't spent any time testing the master branch with the whole suite
recently.


On Thu, Jun 26, 2014 at 4:59 PM, Mikhail Antonov <[email protected]>
wrote:

> And if you disable forking completely, do the tests pass for you always, or
> they also fail intermittently?
>
>
> 2014-06-26 15:59 GMT-07:00 Andrew Purtell <[email protected]>:
>
> > Additionally we run unit tests in parallel to reduce the total time
> > required for test suite execution. Surefire will fork multiple JVMs,
> > dynamically generate test jars containing a subset of tests, and run
> them.
> > That can make isolating hanging tests difficult but this behavior can be
> > influenced by defines on the Maven command line. For example, to fork a
> > process for every single unit test:
> >
> >     mvn test -Dsurefire.firstPartForkMode=always
> > -Dsurefire.secondPartForkMode=always
> >
> > And then if you find a hanging surefire runner, you can dump thread
> stacks
> > of that JVM and know only the unit test you find methods of in the stacks
> > contributed to the current wedged state.
> >
> >
> > On Thu, Jun 26, 2014 at 3:48 PM, Andrew Purtell <[email protected]>
> > wrote:
> >
> > > Java 7u60 64-bit on an EC2 m3.4xlarge. Just running the unit test suite
> > in
> > > a loop. I don't set any special Maven options in MVN_OPTS or anything
> > like
> > > that.
> > >
> > > Historically failures that occur when the suite executes but do not
> when
> > > individual tests pass happen because one test does not shut down in a
> > > timely manner, or at all, and a subsequent test might use the same
> > > hardcoded path or port. When that happens we have a sporadic and
> > sometimes
> > > load sensitive failure. Complicating, each time one clones a repository
> > on
> > > a different host or file filesystem JUnit may pick up a different test
> > > order, influenced by whatever readdir hands back for each package.
> > >
> > >
> > >
> > >
> > > On Thu, Jun 26, 2014 at 3:25 PM, Mikhail Antonov <[email protected]
> >
> > > wrote:
> > >
> > >> Andrew,
> > >>
> > >> Could you share some details - on what env. you're running the tests,
> > and
> > >> at which point do that fail? I'm curious because of lately I'm seeing
> > >> weird
> > >> failures on current master too, which do not happen on hadoop-qa -
> > >>  individual tests always pass, but when running the suite tests either
> > get
> > >> stuck and time out (in roughly the same point), or fail with NPE or
> > >> PermGen
> > >> exception. I've been blaming my environment first, but may be it's
> > >> something related.
> > >>
> > >> -Mikhail
> > >>
> > >>
> > >>
> > >>
> > >> 2014-06-26 13:39 GMT-07:00 Andrew Purtell <[email protected]>:
> > >>
> > >> > I'm finding that repeated runs of the unit test suite at the head of
> > >> branch
> > >> > 0.98 intermittently fail. Individual tests do not, so this likely a
> > >> lagging
> > >> > shutdown, port/resource conflict, and/or zombie test issue. I am
> > >> currently
> > >> > bisecting commits on 0.98 branch since the last release in the hope
> of
> > >> > pinning this down to a single change. Depending on how quickly that
> > can
> > >> > happen, the RC might happen on Monday or not. As things stand at the
> > >> head
> > >> > of the branch, I'd not +1 the RC given the release criteria I've
> been
> > >> using
> > >> > up to now.
> > >>
> > >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>
>
>
> --
> Thanks,
> Michael Antonov
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Planning to roll the 0.98.4 RC on 6/30

Reply via email to