Re: Decrease number of threads in Jenkins builds to reduce flakyness

Bogdan Kanivets Fri, 19 Oct 2018 21:42:02 -0700

I think the argument for keeping concurrency is that it may manifest some
unknown problems with the code.


Maybe a middle ground  - move largest offenders into separate junit tag and
run them after rest of the test with threads=1. Hopefully this will make
life better for PRs.

On the note of largest offenders, I've done 44 runs on aws r3.large with
various thread settings (1, 2, 4, 8).
Failure counts:
      1 testNextConfigAlreadyActive
      1 testNonExistingOpCode
      1 testRaceConditionBetweenLeaderAndAckRequestProcessor
      1 testWatcherDisconnectOnClose
      2 testDoubleElection
      5 testCurrentServersAreObserversInNextConfig
      5 testNormalFollowerRunWithDiff
      7 startSingleServerTest
     18 testNodeDataChanged

Haven't seen testPurgeWhenLogRollingInProgress
or testManyChildWatchersAutoReset failing yet.



On Thu, Oct 18, 2018 at 10:03 PM Michael Han <[email protected]> wrote:

> It's a good idea to reduce the concurrency of to eliminate flakyness. Looks
> like single threaded unit tests on trunk is pretty stable
> https://builds.apache.org/job/zookeeper-trunk-single-thread/ (some
> failures
> are due to C tests). The build time is longer, but not too bad (for
> pre-commit build, for nightly build, build time should not be a concern at
> all).
>
>
> On Mon, Oct 15, 2018 at 5:50 AM Andor Molnar <[email protected]>
> wrote:
>
> > +1
> >
> >
> >
> > On Mon, Oct 15, 2018 at 1:55 PM, Enrico Olivelli <[email protected]>
> > wrote:
> >
> > > Il giorno lun 15 ott 2018 alle ore 12:46 Andor Molnar
> > > <[email protected]> ha scritto:
> > > >
> > > > Thank you guys. This is great help.
> > > >
> > > > I remember your efforts Bogdan, as far as I remember you observer
> > thread
> > > starvation in multiple runs on Apache Jenkins. Correct my if I’m wrong.
> > > >
> > > > I’ve created an umbrella Jira to capture all flaky test fixing
> efforts
> > > here:
> > > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170 <
> > > https://issues.apache.org/jira/browse/ZOOKEEPER-3170>
> > > >
> > > > All previous flaky-related tickets have been converted to sub-tasks.
> > > Some of them might not be up-to-date, please consider reviewing them
> and
> > > close if possible. Additionally feel free to create new sub-tasks to
> > > capture your actual work.
> > > >
> > > > I’ve already modified Trunk and branch-3.5 builds to run on 4 threads
> > > for testing initially. It resulted in slightly more stable tests:
> > >
> > > +1
> > >
> > > I have assigned the umbrella issue to you Andor as you are driving
> > > this important task. is is ok ?
> > >
> > > thank you
> > >
> > > Enrico
> > >
> > >
> > > >
> > > > Trunk (java 8) - failing 1/4 (since #229) - build time increased by
> > > 40-45%
> > > > Trunk (java 9) - failing 0/2 (since #993) - ~40%
> > > > Trunk (java 10) - failing 1/2 (since #280) -
> > > > branch-3.5 (java 8) - failing 0/4 (since #1153) - ~35-45%
> > > >
> > > > However the pattern is not big enough and results are inaccurate, so
> I
> > > need more builds. I also need to fix a bug in SSL to get java9/10
> builds
> > > working on 3.5.
> > > >
> > > > Please let me know if I should revert the changes. Precommit build is
> > > still running on 8 threads, but I’d like to change that one too.
> > > >
> > > > Regards,
> > > > Andor
> > > >
> > > >
> > > >
> > > > > On 2018. Oct 15., at 9:31, Bogdan Kanivets <[email protected]>
> > > wrote:
> > > > >
> > > > > Fangmin,
> > > > >
> > > > > Those are good ideas.
> > > > >
> > > > > FYI, I've stated running tests continuously in aws m1.xlarge.
> > > > > https://github.com/lavacat/zookeeper-tests-lab
> > > > >
> > > > > So far, I've done ~ 12 runs of trunk. Same common offenders as in
> > Flaky
> > > > > dash: testManyChildWatchersAutoReset,
> testPurgeWhenLogRollingInProgr
> > > ess
> > > > > I'll do some more runs, then try to come up with report.
> > > > >
> > > > > I'm using aws and not Apache Jenkins env because of better
> > > > > control/observability.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Oct 14, 2018 at 4:58 PM Fangmin Lv <[email protected]>
> > > wrote:
> > > > >
> > > > >> Internally, we also did some works to reduce the flaky, here are
> the
> > > main
> > > > >> things we've done:
> > > > >>
> > > > >> * using retry rule to retry in case the zk client lost it's
> > > connection,
> > > > >> this could happen if the quorum tests is running on unstable
> > > environment
> > > > >> and the leader election happened.
> > > > >> * using random port instead of sequentially to avoid the port
> racing
> > > when
> > > > >> running tests concurrently
> > > > >> * changing tests to avoid using the same test path when
> > > creating/deleting
> > > > >> nodes
> > > > >>
> > > > >> These greatly reduced the flaky internally, we should try those if
> > > we're
> > > > >> seeing similar issues in the Jenkins.
> > > > >>
> > > > >> Fangmin
> > > > >>
> > > > >> On Sat, Oct 13, 2018 at 10:48 AM Bogdan Kanivets <
> > [email protected]
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> I've looked into flakiness couple months ago (special attention
> on
> > > > >>> testManyChildWatchersAutoReset). In my opinion the problem is a)
> > > and c).
> > > > >>> Unfortunately I don't have data to back this claim.
> > > > >>>
> > > > >>> I don't remember seeing many 'port binding' exceptions. Unless
> > 'port
> > > > >>> assignment' issue manifested as some other exception.
> > > > >>>
> > > > >>> Before decreasing number of threads I think more data should be
> > > > >>> collected/visualized
> > > > >>>
> > > > >>> 1) Flaky dashboard is great, but we should add another report
> that
> > > maps
> > > > >>> 'error causes' to builds/tests
> > > > >>> 2) Flaky dash can be extended to save more history (for example
> > like
> > > this
> > > > >>> https://www.chromium.org/developers/testing/flakiness-dashboard)
> > > > >>> 3) PreCommit builds should be included in dashboard
> > > > >>> 4) We should have a common clean benchmark. For example - take
> > > > >>> AWS t3.xlarge instance with set linux distro, jvm, zk commit sha
> > and
> > > run
> > > > >>> tests (current 8 threads) for 8 hours with 1 min cooldown.
> > > > >>>
> > > > >>> Due to recent employment change, I got sidetracked, but I really
> > > want to
> > > > >>> get to the bottom of this.
> > > > >>> I'm going to setup 4) and report results to this mailing list.
> Also
> > > > >> willing
> > > > >>> to work on other items.
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli <
> > [email protected]
> > > >
> > > > >>> wrote:
> > > > >>>
> > > > >>>> Il ven 12 ott 2018, 23:17 Benjamin Reed <[email protected]> ha
> > > scritto:
> > > > >>>>
> > > > >>>>> i think the unique port assignment (d) is more problematic than
> > it
> > > > >>>>> appears. there is a race between finding a free port and
> actually
> > > > >>>>> grabbing it. i think that contributes to the flakiness.
> > > > >>>>>
> > > > >>>>
> > > > >>>> This is very hard to solve for our test cases, because we need
> to
> > > build
> > > > >>>> configs before starting the groups of servers.
> > > > >>>> For tests in single server it will be easier, you just have to
> > start
> > > > >> the
> > > > >>>> server on port zero, get the port and the create client configs.
> > > > >>>> I don't know how much it will be worth
> > > > >>>>
> > > > >>>> Enrico
> > > > >>>>
> > > > >>>>
> > > > >>>>> ben
> > > > >>>>> On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <[email protected]
> >
> > > > >> wrote:
> > > > >>>>>>
> > > > >>>>>> That is a completely valid point. I started to investigate
> > flakies
> > > > >>> for
> > > > >>>>> exactly the same reason, if you remember the thread that I
> > started
> > > a
> > > > >>>> while
> > > > >>>>> ago. It was later abandoned unfortunately, because I’ve run
> into
> > a
> > > > >> few
> > > > >>>>> issues:
> > > > >>>>>>
> > > > >>>>>> - We nailed down that in order to release 3.5 stable, we have
> to
> > > > >> make
> > > > >>>>> sure it’s not worse than 3.4 by comparing the builds: but these
> > > > >> builds
> > > > >>>> are
> > > > >>>>> not comparable, because 3.4 tests running single threaded while
> > 3.5
> > > > >>>>> multithreaded showing problems which might also exist on 3.4,
> > > > >>>>>>
> > > > >>>>>> - Neither of them running C++ tests for some reason, but
> that’s
> > > not
> > > > >>>>> really an issue here,
> > > > >>>>>>
> > > > >>>>>> - Looks like tests on 3.5 is just as solid as on 3.4, because
> > > > >> running
> > > > >>>>> them on a dedicated, single threaded environment show almost
> all
> > > > >> tests
> > > > >>>>> succeeding,
> > > > >>>>>>
> > > > >>>>>> - I think the root cause of failing unit tests could be one
> (or
> > > > >> more)
> > > > >>>> of
> > > > >>>>> the following:
> > > > >>>>>>        a) Environmental: Jenkins slave gets overloaded with
> > other
> > > > >>>>> builds and multithreaded test running makes things even worse:
> > > > >> starving
> > > > >>>> JDK
> > > > >>>>> threads and ZK instances (both clients and servers) are unable
> to
> > > > >>> operate
> > > > >>>>>>        b) Conceptional: ZK unit tests were not designed to run
> > on
> > > > >>>>> multiple threads: I investigated the unique port assignment
> > feature
> > > > >>> which
> > > > >>>>> is looking good, but there could be other possible gaps which
> > makes
> > > > >>> them
> > > > >>>>> unreliable when running simultaneously.
> > > > >>>>>>        c) Bad testing: testing ZK in the wrong way, making bad
> > > > >>>>> assumption (e.g. not syncing clients), etc.
> > > > >>>>>>        d) Bug in the server.
> > > > >>>>>>
> > > > >>>>>> I feel that finding case d) with these tests is super hard,
> > > > >> because a
> > > > >>>>> test report doesn’t give any information on what could go wrong
> > > with
> > > > >>>>> ZooKeeper. More or less guessing is your only option.
> > > > >>>>>>
> > > > >>>>>> Finding c) is a little bit easier, I’m trying to submit
> patches
> > on
> > > > >>> them
> > > > >>>>> and hopefully making some progress.
> > > > >>>>>>
> > > > >>>>>> The huge pain in the arse though are a) and b): people
> > desperately
> > > > >>> keep
> > > > >>>>> commenting “please retest this” on github to get a green build
> > > while
> > > > >>>>> testing is going in a direction to hide real problems: I mean
> > > people
> > > > >>>>> started not to care about a failing build, because “it must be
> > some
> > > > >>> flaky
> > > > >>>>> unrelated to my patch”. Which is bad, but the shame is it’s
> true
> > > 90%
> > > > >>>>> percent of cases.
> > > > >>>>>>
> > > > >>>>>> I’m just trying to find some ways - besides fixing c) and d)
> > > > >> flakies
> > > > >>> -
> > > > >>>>> to get more reliable and more informative Jenkins builds. Don’t
> > > want
> > > > >> to
> > > > >>>>> make a huge turnaround, but I think if we can get a
> significantly
> > > > >> more
> > > > >>>>> reliable build for the price of slightly longer build time
> > running
> > > > >> on 4
> > > > >>>>> threads instead of 8, I say let’s do it.
> > > > >>>>>>
> > > > >>>>>> As always, any help from the community is more than welcome
> and
> > > > >>>>> appreciated.
> > > > >>>>>>
> > > > >>>>>> Thanks,
> > > > >>>>>> Andor
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> On 2018. Oct 12., at 16:52, Patrick Hunt <[email protected]>
> > > > >> wrote:
> > > > >>>>>>>
> > > > >>>>>>> iirc the number of threads was increased to improve
> > performance.
> > > > >>>>> Reducing
> > > > >>>>>>> is fine, but do we understand why it's failing? Perhaps it's
> > > > >>> finding
> > > > >>>>> real
> > > > >>>>>>> issues as a result of the artificial concurrency/load.
> > > > >>>>>>>
> > > > >>>>>>> Patrick
> > > > >>>>>>>
> > > > >>>>>>> On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
> > > > >>>>> <[email protected]>
> > > > >>>>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> Thanks for the feedback.
> > > > >>>>>>>> I'm running a few tests now: branch-3.5 on 2 threads and
> trunk
> > > > >> on
> > > > >>> 4
> > > > >>>>> threads
> > > > >>>>>>>> to see what's the impact on the build time.
> > > > >>>>>>>>
> > > > >>>>>>>> Github PR job is hard to configure, because its settings are
> > > > >> hard
> > > > >>>>> coded
> > > > >>>>>>>> into a shell script in the codebase. I have to open PR for
> > that.
> > > > >>>>>>>>
> > > > >>>>>>>> Andor
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> > > > >>>>>>>> [email protected]> wrote:
> > > > >>>>>>>>
> > > > >>>>>>>>> +1, running the tests locally with 1 thread always passes
> > > > >> (well,
> > > > >>> I
> > > > >>>>> run it
> > > > >>>>>>>>> about 5 times, but still)
> > > > >>>>>>>>> On the other hand, running it on 8 threads yields similarly
> > > > >> flaky
> > > > >>>>> results
> > > > >>>>>>>>> as Apache runs. (Although it is much faster, but if we have
> > to
> > > > >>> run
> > > > >>>>> 6-8-10
> > > > >>>>>>>>> times sometimes to get a green run...)
> > > > >>>>>>>>>
> > > > >>>>>>>>> Norbert
> > > > >>>>>>>>>
> > > > >>>>>>>>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli <
> > > > >>>> [email protected]
> > > > >>>>>>
> > > > >>>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>>> +1
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Enrico
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Il ven 12 ott 2018, 13:52 Andor Molnar <[email protected]>
> > ha
> > > > >>>>> scritto:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> Hi,
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> What do you think of changing number of threads running
> > unit
> > > > >>>> tests
> > > > >>>>> in
> > > > >>>>>>>>>>> Jenkins from current 8 to 4 or even 2?
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Running unit tests inside Cloudera environment on a
> single
> > > > >>> thread
> > > > >>>>>>>> shows
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>> builds much more stable. That would be probably too slow,
> > but
> > > > >>>> maybe
> > > > >>>>>>>>>> running
> > > > >>>>>>>>>>> at least less threads would improve the situation.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> It's getting very annoying that I cannot get a green
> build
> > on
> > > > >>>>> GitHub
> > > > >>>>>>>>> with
> > > > >>>>>>>>>>> only a few retests.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Regards,
> > > > >>>>>>>>>>> Andor
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>> --
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> -- Enrico Olivelli
> > > > >>>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>> --
> > > > >>>>
> > > > >>>>
> > > > >>>> -- Enrico Olivelli
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Re: Decrease number of threads in Jenkins builds to reduce flakyness

Reply via email to