I see in https://builds.apache.org/computer/ubuntu-2/load-statistics (used for the 0.98 build mentionned by Andrew above) that we have a configuration with 2 executors. It means that jenkins tries to run 2 builds in parallel, each of these builds will trigger its own set of surefire forks.
iirc, in the past: - we were not building on these machines, we were using only the hadoop pool of machines - these machines were configured with 1 executor >From what I see, there are two sets of machines - H*, for hadoop projects. H0 (for example) is configured with a single executor. - ubuntu*, for everybody: ubuntu2 (for example) is configured with 2 executors. 0.98 and PreCommit-HBASE-Build are configured with: (ubuntu||Hadoop) && !jenkins-cloud-4GB && !H11 So it depends: lucky = H*. Unlucky = ubuntu* I don't know who changed this, nor why, but may be we should not go to ubuntu* machines. Or, if it's possible, we should have a different config for these machines. On Mon, Jan 19, 2015 at 7:11 PM, Andrew Purtell <[email protected]> wrote: > The 0.98 build is still showing this problem (latest as of now at > https://builds.apache.org/job/hbase-0.98/803), so I went ahead and made > the > proposed change, but only to the 0.98 builds. I'll let you know if it > provides any improvement. > > > On Sun, Jan 18, 2015 at 10:00 AM, Andrew Purtell <[email protected] > > > wrote: > > > Forked VMs are being killed in the 0.98 builds. That suggests > > infrastructure issues. > > > > Having only one test execute in a forked runner does mean the finding of > a > > zombie and thread dumps or other state from the runner will identify and > > characterize a sick test with no unrelated state mixed in. > > > > > > > On Jan 17, 2015, at 7:43 PM, Stack <[email protected]> wrote: > > > > > > Agree, try anything to get our blues back. We add back the //ism after > > all > > > settles. > > > > > > Do you think something has changed in INFRA Andy? Is it more contended? > > Or, > > > more likely, is it that we've been committing stuff that has > destabilized > > > builds? We had a good streak of blue there for a while. It just took > some > > > work fixing breakage and watching jenkins to make sure breakage didn't > > > sneak in, but we've lapsed for sure. > > > > > > St.Ack > > > > > >> On Sat, Jan 17, 2015 at 9:19 AM, Dima Spivak <[email protected]> > > wrote: > > >> > > >> Not running tests in parallel will definitely cut down on Surefire > > >> flakiness (and in contention that sometimes leads to false failures in > > >> resource-hungry tests), but it will probably also balloon test run > > times to > > >> about two hours. Probably worth it in the short term, but we > > >> eventually need to do something about some of these heavy tests. > > >> > > >> -Dima > > >> > > >> On Friday, January 16, 2015, Andrew Purtell <[email protected] > > > > >> wrote: > > >> > > >>> You might have missed the larger issue Ted. > > >>> > > >>> > > >>>> On Jan 16, 2015, at 4:48 PM, Ted Yu <[email protected] > > >> <javascript:;>> > > >>> wrote: > > >>>> > > >>>> With HBASE-12874, we should get a green build for branch-1.0 > > >>>> > > >>>> FYI > > >>>> > > >>>> On Fri, Jan 16, 2015 at 12:20 PM, Andrew Purtell < > [email protected] > > >>> <javascript:;>> > > >>>> wrote: > > >>>> > > >>>>> See BUILDS-49 tracking issues specifically with 0.98 jobs, but I > just > > >>>>> noticed trunk, branch-1, and branch-1.0 all failed after I checked > in > > >> a > > >>>>> shell doc fix due to a timeout or fork failure. > > >>>>> > > >>>>> I propose we update all Jenkins jobs to not run tests in parallel, > > >> i.e. > > >>> add > > >>>>> "-Dsurefire.firstPartForkCount=1 -Dsurefire.secondPartForkCount=1" > > >>>>> > > >>>>> -- > > >>>>> Best regards, > > >>>>> > > >>>>> - Andy > > >>>>> > > >>>>> Problems worthy of attack prove their worth by hitting back. - Piet > > >> Hein > > >>>>> (via Tom White) > > >> > > >
