I've looked into flakiness couple months ago (special attention on
testManyChildWatchersAutoReset). In my opinion the problem is a) and c).
Unfortunately I don't have data to back this claim.

I don't remember seeing many 'port binding' exceptions. Unless 'port
assignment' issue manifested as some other exception.

Before decreasing number of threads I think more data should be
collected/visualized

1) Flaky dashboard is great, but we should add another report that maps
'error causes' to builds/tests
2) Flaky dash can be extended to save more history (for example like this
https://www.chromium.org/developers/testing/flakiness-dashboard)
3) PreCommit builds should be included in dashboard
4) We should have a common clean benchmark. For example - take
AWS t3.xlarge instance with set linux distro, jvm, zk commit sha and run
tests (current 8 threads) for 8 hours with 1 min cooldown.

Due to recent employment change, I got sidetracked, but I really want to
get to the bottom of this.
I'm going to setup 4) and report results to this mailing list. Also willing
to work on other items.






On Sat, Oct 13, 2018 at 4:59 AM Enrico Olivelli <[email protected]> wrote:

> Il ven 12 ott 2018, 23:17 Benjamin Reed <[email protected]> ha scritto:
>
> > i think the unique port assignment (d) is more problematic than it
> > appears. there is a race between finding a free port and actually
> > grabbing it. i think that contributes to the flakiness.
> >
>
> This is very hard to solve for our test cases, because we need to build
> configs before starting the groups of servers.
> For tests in single server it will be easier, you just have to start the
> server on port zero, get the port and the create client configs.
> I don't know how much it will be worth
>
> Enrico
>
>
> > ben
> > On Fri, Oct 12, 2018 at 8:50 AM Andor Molnar <[email protected]> wrote:
> > >
> > > That is a completely valid point. I started to investigate flakies for
> > exactly the same reason, if you remember the thread that I started a
> while
> > ago. It was later abandoned unfortunately, because I’ve run into a few
> > issues:
> > >
> > > - We nailed down that in order to release 3.5 stable, we have to make
> > sure it’s not worse than 3.4 by comparing the builds: but these builds
> are
> > not comparable, because 3.4 tests running single threaded while 3.5
> > multithreaded showing problems which might also exist on 3.4,
> > >
> > > - Neither of them running C++ tests for some reason, but that’s not
> > really an issue here,
> > >
> > > - Looks like tests on 3.5 is just as solid as on 3.4, because running
> > them on a dedicated, single threaded environment show almost all tests
> > succeeding,
> > >
> > > - I think the root cause of failing unit tests could be one (or more)
> of
> > the following:
> > >         a) Environmental: Jenkins slave gets overloaded with other
> > builds and multithreaded test running makes things even worse: starving
> JDK
> > threads and ZK instances (both clients and servers) are unable to operate
> > >         b) Conceptional: ZK unit tests were not designed to run on
> > multiple threads: I investigated the unique port assignment feature which
> > is looking good, but there could be other possible gaps which makes them
> > unreliable when running simultaneously.
> > >         c) Bad testing: testing ZK in the wrong way, making bad
> > assumption (e.g. not syncing clients), etc.
> > >         d) Bug in the server.
> > >
> > > I feel that finding case d) with these tests is super hard, because a
> > test report doesn’t give any information on what could go wrong with
> > ZooKeeper. More or less guessing is your only option.
> > >
> > > Finding c) is a little bit easier, I’m trying to submit patches on them
> > and hopefully making some progress.
> > >
> > > The huge pain in the arse though are a) and b): people desperately keep
> > commenting “please retest this” on github to get a green build while
> > testing is going in a direction to hide real problems: I mean people
> > started not to care about a failing build, because “it must be some flaky
> > unrelated to my patch”. Which is bad, but the shame is it’s true 90%
> > percent of cases.
> > >
> > > I’m just trying to find some ways - besides fixing c) and d) flakies -
> > to get more reliable and more informative Jenkins builds. Don’t want to
> > make a huge turnaround, but I think if we can get a significantly more
> > reliable build for the price of slightly longer build time running on 4
> > threads instead of 8, I say let’s do it.
> > >
> > > As always, any help from the community is more than welcome and
> > appreciated.
> > >
> > > Thanks,
> > > Andor
> > >
> > >
> > >
> > >
> > > > On 2018. Oct 12., at 16:52, Patrick Hunt <[email protected]> wrote:
> > > >
> > > > iirc the number of threads was increased to improve performance.
> > Reducing
> > > > is fine, but do we understand why it's failing? Perhaps it's finding
> > real
> > > > issues as a result of the artificial concurrency/load.
> > > >
> > > > Patrick
> > > >
> > > > On Fri, Oct 12, 2018 at 7:12 AM Andor Molnar
> > <[email protected]>
> > > > wrote:
> > > >
> > > >> Thanks for the feedback.
> > > >> I'm running a few tests now: branch-3.5 on 2 threads and trunk on 4
> > threads
> > > >> to see what's the impact on the build time.
> > > >>
> > > >> Github PR job is hard to configure, because its settings are hard
> > coded
> > > >> into a shell script in the codebase. I have to open PR for that.
> > > >>
> > > >> Andor
> > > >>
> > > >>
> > > >>
> > > >> On Fri, Oct 12, 2018 at 2:46 PM, Norbert Kalmar <
> > > >> [email protected]> wrote:
> > > >>
> > > >>> +1, running the tests locally with 1 thread always passes (well, I
> > run it
> > > >>> about 5 times, but still)
> > > >>> On the other hand, running it on 8 threads yields similarly flaky
> > results
> > > >>> as Apache runs. (Although it is much faster, but if we have to run
> > 6-8-10
> > > >>> times sometimes to get a green run...)
> > > >>>
> > > >>> Norbert
> > > >>>
> > > >>> On Fri, Oct 12, 2018 at 2:05 PM Enrico Olivelli <
> [email protected]
> > >
> > > >>> wrote:
> > > >>>
> > > >>>> +1
> > > >>>>
> > > >>>> Enrico
> > > >>>>
> > > >>>> Il ven 12 ott 2018, 13:52 Andor Molnar <[email protected]> ha
> > scritto:
> > > >>>>
> > > >>>>> Hi,
> > > >>>>>
> > > >>>>> What do you think of changing number of threads running unit
> tests
> > in
> > > >>>>> Jenkins from current 8 to 4 or even 2?
> > > >>>>>
> > > >>>>> Running unit tests inside Cloudera environment on a single thread
> > > >> shows
> > > >>>> the
> > > >>>>> builds much more stable. That would be probably too slow, but
> maybe
> > > >>>> running
> > > >>>>> at least less threads would improve the situation.
> > > >>>>>
> > > >>>>> It's getting very annoying that I cannot get a green build on
> > GitHub
> > > >>> with
> > > >>>>> only a few retests.
> > > >>>>>
> > > >>>>> Regards,
> > > >>>>> Andor
> > > >>>>>
> > > >>>> --
> > > >>>>
> > > >>>>
> > > >>>> -- Enrico Olivelli
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> --
>
>
> -- Enrico Olivelli
>

Reply via email to