Thank you Denis for taking up this initiative. With respect to "Introduce per-PR CI bot" and the "[mxnet-ci] run" command. Would it make sense to add "retriggering only failed pipelines" to the scope? For example users could be asked to specify the name of the pipeline, or have "[mxnet-ci] run all" and "[mxnet-ci] run failed".
In the current state, when retriggering all pipelines, it's likely that one of them will fail. Only by retriggering the failed pipeline alone there is a higher chance to arrive at a state where all pipelines have succeeded. On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote: > Hello, MXNet dev community, > As you all know, the experience with CI infrastructure isn’t ideal in spite of > its high cost. For this reason, we’re proposing the following changes to > improve stability, reduce cost, and grant more control to contributors. As we > work in a refresh of CI, we believe these changes will reduce the pain we all > suffer when we try to push a PR through the system. > > Following is the list of changes: > Fix missing status reports between GH and Jenkins > Update Jenkins permission groups to re-trigger builds > Introduce per-PR CI bot > Details: > > - Fix missing status reports > Currently, once commit gets added to PR - the CI is run on that added commit. > Sometimes, CI run status is missing from the commit in Github despite having > completed in Jenkins. Example: CI run: > http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline > , commit status in github (missing unix-cpu, unix-gpu and windows-gpu > statuses): > https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging. > Problem: There seems to be a bug where some status reports are missing on > Github. The hypothesis is that there is some issue with Github Hooks. > > - Update Jenkins permission groups to re-trigger builds > Problem: Currently, only MXNet Committers and selected people from AWS have > the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting > for authorized users to re-trigger their PRs for them. > Solution : Allow these membership categories Jenkins Admins, MXNet Committers, > and PR Authors to re-trigger PR builds. > > - Introduce per-PR CI bot > Problem: As of date, MXNet CI is automated. It runs every time a commit is > pushed onto your Github PR. This results in lot of unnecessary CI runs apart > from added costs. > Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 > categories mentioned above) can trigger CI run by adding a simple comment to > PR: “[mxnet-ci] run”. > > -- > Thank you, > > AWS MXNet team > > >