Thanks Naveen and Gavin! #1 has been completed and every job has finished its processing.
#2 is the ticket with infra: https://issues.apache.org/jira/browse/INFRA-17346 I'm now waiting for their response. -Marco On Fri, Nov 30, 2018 at 8:25 PM Naveen Swamy <mnnav...@gmail.com> wrote: > Hi Marco/Gavin, > > Thanks for the clarification. I was not aware that it has been tested on a > separate test environment(this is what I was suggesting and make the > changes in a more controlled manner), last time the change was made, many > PRs were left dangling and developers had to go trigger and I triggered > them at least 5 times before it succeeded today. > > Appreciate all the hard work to make CI better. > > -Naveen > > On Fri, Nov 30, 2018 at 8:50 AM Gavin M. Bell <gavin.max.b...@gmail.com> > wrote: > > > Hey Folks, > > > > Marco has been running this change in dev, with flying colors, for some > > time. This is not an experiment but a roll out that was announced. We > also > > decided to make this change post the release cut so limit the blast > radius > > from any critical obligations to the community. Marco is accountable for > > this work and will address any issues that may occur as he has been put > > on-call. We have, to our best ability, mitigated as much risk as > possible > > and now it is time to pull the trigger. The community will enjoy a bit > > more visibility and clarity into the test process which will be > > advantageous, as well as allowing us to extend our infrastructure in a > way > > that affords us more flexibility. > > > > No pending PRs will be impacted. > > > > Thank you for your support as we evolve this system to better serve the > > community. > > > > -Gavin > > > > On Fri, Nov 30, 2018 at 5:23 PM Marco de Abreu > > <marco.g.ab...@googlemail.com.invalid> wrote: > > > > > Hello Naveen, this is not an experiment. Everything has been tested in > > our > > > test system and is considered working 100%. This is not a test but > > actually > > > the move into production - the merge into master happened a week ago. > We > > > now just have to put all PRs into the catalogue, which means that all > PRs > > > have to be analyzed with the new pipelines - the only thing that will > be > > > noticeable is that the CI is under higher load. > > > > > > The pending PRs will not be impacted. The existing pipeline is still > > > running in parallel and everything will behave as before. > > > > > > -Marco > > > > > > On Fri, Nov 30, 2018 at 4:41 PM Naveen Swamy <mnnav...@gmail.com> > wrote: > > > > > > > Marco, run your experiments on a branch - set up, test it well and > then > > > > bring it to the master. > > > > > > > > > On Nov 30, 2018, at 6:53 AM, Marco de Abreu < > > > > marco.g.ab...@googlemail.com.INVALID> wrote: > > > > > > > > > > Hello, > > > > > > > > > > I'm now moving forward with #1. I will try to get to #3 as soon as > > > > possible > > > > > to reduce parallel jobs in our CI. You might notice some unfinished > > > > jobs. I > > > > > will let you know as soon as this process has been completed. Until > > > then, > > > > > please bare with me since we have hundreds of jobs to run in order > to > > > > > validate all PRs. > > > > > > > > > > Best regards, > > > > > Marco > > > > > > > > > > On Fri, Nov 30, 2018 at 1:36 AM Marco de Abreu < > > > > marco.g.ab...@googlemail.com> > > > > > wrote: > > > > > > > > > >> Hello, > > > > >> > > > > >> since the release branch has now been cut, I would like to move > > > forward > > > > >> with the CI improvements for the master branch. This would include > > the > > > > >> following actions: > > > > >> 1. Re-enable the new Jenkins job > > > > >> 2. Request Apache Infra to move the protected branch check from > the > > > main > > > > >> pipeline to our new ones > > > > >> 3. Merge https://github.com/apache/incubator-mxnet/pull/13474 - > > this > > > > >> finalizes the deprecation process > > > > >> > > > > >> If nobody objects, I would like to start with #1 soon. Mentors, > > could > > > > you > > > > >> please assist to create the Apache Infra ticket? I would then take > > it > > > > from > > > > >> there and talk to Infra. > > > > >> > > > > >> Best regards, > > > > >> Marco > > > > >> > > > > >> On Mon, Nov 26, 2018 at 2:47 AM kellen sunderland < > > > > >> kellen.sunderl...@gmail.com> wrote: > > > > >> > > > > >>> Sorry, [1] meant to reference > > > > >>> https://issues.jenkins-ci.org/browse/JENKINS-37984 . > > > > >>> > > > > >>> On Sun, Nov 25, 2018 at 5:41 PM kellen sunderland < > > > > >>> kellen.sunderl...@gmail.com> wrote: > > > > >>> > > > > >>>> Marco and I ran into another urgent issue over the weekend that > > was > > > > >>>> causing builds to fail. This issue was unrelated to any feature > > > > >>>> development work, or other CI fixes applied recently, but it did > > > > require > > > > >>>> quite a bit of work from Marco (and a little from me) to fix. > > > > >>>> > > > > >>>> We spent enough time on the problem that it caused us to take a > > step > > > > >>> back > > > > >>>> and consider how we could both fix issues in CI and support the > > 1.4 > > > > >>> release > > > > >>>> with the least impact possible on MXNet devs. Marco had planned > > to > > > > >>> make a > > > > >>>> significant change to the CI to fix a long-standing Jenkins > error > > > [1], > > > > >>> but > > > > >>>> we feel that most developers would prioritize having a stable > > build > > > > >>>> environment for the next few weeks over having this fix in > place. > > > > >>>> > > > > >>>> To properly introduce a new CI system the intent was to do a > > gradual > > > > >>>> blue/green roll out of the fix. To manage this rollout would > have > > > > taken > > > > >>>> operational effort and double compute load as we run systems in > > > > >>> parallel. > > > > >>>> This risks outages due to scaling limits, and we’d rather make > > this > > > > >>> change > > > > >>>> during a period of low-developer activity, i.e. shortly after > the > > > 1.4 > > > > >>>> release. > > > > >>>> > > > > >>>> This means that from now until the 1.4 release, in order to > reduce > > > > >>>> complexity MXNet developers should only see a single Jenkins > > > > >>> verification > > > > >>>> check, and a single Travis check. > > > > >>>> > > > > >>>> > > > > >>> > > > > >> > > > > > > > > > > > > > -- > > Sincerely, > > Gavin M. Bell > > > > "Never mistake a clear view for a short distance." > > -Paul Saffo > > >