-0.5 (non-binding).   I would propose that if there are pieces of CI that
are slowing down development (and I think we can all agree that there are)
that we should strip out these problematic CI pieces and open issues for
them.  We can then assign issues to people and evaluate what should be done
to fix or mitigate the root cause of the problems (be it slow runtime,
flakiness, etc.).  This way we will get to a state where the CI is stable,
and we'll have a backlog of issues to fix.

-Kellen

On Mon, Nov 20, 2017 at 2:36 PM, Marco de Abreu <
marco.g.ab...@googlemail.com> wrote:

> Small update regarding the status of the new CI: We've been able to get
> ubuntu builds and tests up and running.
>
> I’ve received requests to provide exact times for all important stages of
> the CI. As the first runs have been executed successfully, I’ll provide you
> with some data. This will give you a small guidance where our bottlenecks
> are located and in which field we can have improvements. All jobs of the
> same stage are being executed in parallel.
>
>
>
> In the following you’ll see the durations of all Ubuntu-Builds and -Tests.
> They are all (CPU- and GPU-tests) executed on a G3.8xlarge with 32 vCPUs.
> These times only cover the actual core execution time of the task
> *without* stashing,
> cleaning etc.
>
>
>
> Build:
>
>    - Amalgamation: 3m40s
>    - Amalgamation MIN: 3m49s
>    - *CPU: Openblas: 8m38s*
>    - *GPU CUDA7.5+cuDNN5: 11m55s*
>    - *GPU: MKLML: 9m42s*
>
>
>
>
>
> Unit Test:
>
>    - Perl CPU: 17m36s
>    - Perl GPU: 8m31s
>    - *Python2: CPU: 1h10m10s*
>    - Python2: GPU: 14m10s
>    - Python2: MKLML-CPU: 10m25s + 1m43s = 12m08s
>    - Python2: MKLML-GPU: 14m19s
>    - *Python3: CPU: 1h8m26s*
>    - Python3: GPU: 14m58s
>    - Python3: MKLML-CPU: 10m31s (seems like this job is not running the
>    Python2-Train-Tests)
>    - Python3: MKLML-GPU: 13m37s
>    - *R: CPU: 20m6s + 9s + 9m58s = 30m13s*
>    - *R: GPU: 21m50s + 10s + 1m17s = 23m17s*
>    - Scala: CPU: 2m18s + 3m57s = 6m15s
>
>
>
> Integration Test:
>
>    - Caffe GPU: 7m42s
>    - Python GPU: 1m47s
>    - Cpp-package GPU: 59s
>
>
> The next step is to get Windows up and running and improve execution time
> on CPU-tasks.
>
> -Marco
>
> On Mon, Nov 20, 2017 at 9:18 PM, YiZhi Liu <javeli...@gmail.com> wrote:
>
> > +1
> >
> > 2017-11-20 11:47 GMT-08:00 Tianqi Chen <tqc...@cs.washington.edu>:
> > > +1 until new CI is implemented.
> > >
> > > Tianqi
> > >
> > > On Mon, Nov 20, 2017 at 11:11 AM, Eric Xie <j...@apache.org> wrote:
> > >
> > >> A lot of people seems to be confused, so let's clarify the separation
> of
> > >> roles/responsibilities:
> > >>
> > >> 1. The committers that merge code are responsible for code quality and
> > >> tests passing on master.
> > >>
> > >> 2. The CI / infra maintainers are responsible for keeping CI running
> > >> properly and honestly reporting bugs.
> > >> If test fails because Jenkins is faulty or slow, you need to fix
> > Jenkins.
> > >> If test fails because there is a bug in the code, then you did a good
> > job
> > >> catching bugs. It is not your responsibility to fix the bug. You
> merely
> > >> should report it to committers.
> > >>
> > >> Thanks,
> > >> Junyuan Xie
> > >>
> > >>
> > >> On 2017-11-19 16:07, Gautam <gautamn...@gmail.com> wrote:
> > >> > -1
> > >> >
> > >> > Please see inline.
> > >> >
> > >> > On Nov 19, 2017 12:51 PM, "Eric Xie" <j...@apache.org> wrote:
> > >> >
> > >> > Hi all,
> > >> > I'm starting this thread to vote on turning off protected master.
> The
> > >> > reasons are:
> > >> >
> > >> > 1. Since we turned on protected master pending PRs has grown from 40
> > to
> > >> 80.
> > >> > It is severely slowing down development.
> > >> >
> > >> >
> > >> >      Turning protection off, will give you ability to merge the code
> > >> which
> > >> > has build failure. How and more importantly who will be best judge
> to
> > >> > figure out what's is wrong ? If CI is a problem then we have people
> > from
> > >> > infra team, who are sort of 'on call" day and night trying to fix.
> > >> > Including me trying to fix the slave at 2 am in night. I have seen
> > >> > commiters sending PRs and if it fails without even looking at the
> > >> > reason,which could be just a temporary error, they reach out to
> infra
> > >> team
> > >> > immediately. So I don't agree that we should turn off the protected
> > >> branch
> > >> > just for the sake of speed. We should care more on quality than
> speed.
> > >> >
> > >> >
> > >> > 2. Committers, not CI, are ultimately responsible for the code they
> > >> merge.
> > >> > You should only override the CI when you are very confident that CI
> is
> > >> the
> > >> > problem, not your code. If it turns out you are wrong, you should
> fix
> > it
> > >> > ASAP. This is the bare minimum requirement for all committers: BE
> > >> > RESPONSIBLE.
> > >> >
> > >> > How do we make sure that this in deed happens?
> > >> >
> > >> > I'm aware of the argument for using protected master: It make sure
> > that
> > >>
> > >>
> > >>
> > >> > master is stable.
> > >> >
> > >> > Well, master will be most stable if we stop adding any commits to
> it.
> > But
> > >> > that's not what we want is it?
> > >> >
> > >> > No we are not saying don't add anything in master, we are just
> saying
> > >> > please don't add bad code to the master. And yes I have seen bad
> code
> > has
> > >> > been merged to the master when protected branch was not enabled.
> > >> >
> > >> > Protected master hardly adds any stability. The faulty tests that
> > breaks
> > >> > master at random got merged into master because they happened to
> > succeed
> > >> > once.
> > >> >
> > >> > That's not true, it filter out one of the important aspect that at
> > least
> > >> > when code was merged it completed the whole cycle of build and test.
> > Sure
> > >> > flaky test we can track down.
> > >> >
> > >> > Thanks,
> > >> > Junyuan Xie
> > >> >
> > >>
> >
> >
> >
> > --
> > Yizhi Liu
> > DMLC member
> >
>

Reply via email to