Thank you Denis for taking up this initiative. With respect to "Introduce 
per-PR 
CI bot" and the "[mxnet-ci] run" command. Would it make sense to add
"retriggering only failed pipelines" to the scope? For example users could be
asked to specify the name of the pipeline, or have "[mxnet-ci] run all" and
"[mxnet-ci] run failed".

In the current state, when retriggering all pipelines, it's likely that one of
them will fail. Only by retriggering the failed pipeline alone there is a higher
 chance to arrive at a state where all pipelines have succeeded.

On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote:
> Hello, MXNet dev community,
> As you all know, the experience with CI infrastructure isn’t ideal in spite of
> its high cost. For this reason, we’re proposing the following changes to
> improve stability, reduce cost, and grant more control to contributors. As we
> work in a refresh of CI, we believe these changes will reduce the pain we all
> suffer when we try to push a PR through the system.
> 
> Following is the list of changes:
> Fix missing status reports between GH and Jenkins
> Update Jenkins permission groups to re-trigger builds
> Introduce per-PR CI bot
> Details:
> 
> - Fix missing status reports
> Currently, once commit gets added to PR - the CI is run on that added commit.
> Sometimes, CI run status is missing from the commit in Github despite having
> completed in Jenkins. Example: CI run: 
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline
> , commit status in github (missing unix-cpu, unix-gpu and windows-gpu
> statuses): 
> https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging.
> Problem: There seems to be a bug where some status reports are missing on
> Github. The hypothesis is that there is some issue with Github Hooks.
> 
> - Update Jenkins permission groups to re-trigger builds
> Problem: Currently, only MXNet Committers and selected people from AWS have
> the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting
> for authorized users to re-trigger their PRs for them.
> Solution : Allow these membership categories Jenkins Admins, MXNet Committers,
> and PR Authors to re-trigger PR builds.
> 
> - Introduce per-PR CI bot
> Problem: As of date, MXNet CI is automated. It runs every time a commit is
> pushed onto your Github PR. This results in lot of unnecessary CI runs apart
> from added costs.
> Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3
> categories mentioned above) can trigger CI run by adding a simple comment to
> PR: “[mxnet-ci] run”. 
> 
> --
> Thank you,
> 
> AWS MXNet team
> 
>  
> 

Reply via email to