+1 These are good action items that should help alleviate part of the
CI issues.

The following comments are not to take away from your proposal. Move
forward, assuming the community agrees.
I'd really like to see particular tests run only when the PR is
touching a related part. While this is more effort, it would really
make a major difference. Light research shows that projects have been
doing this for quite some time, so it wouldn't be a new invention and
deep exploration.

I realize there are a lot of interdependencies and it would probably
not work for everything. But, what if we start small?
--> Docs pages (*.rst, *.md, *.html, *.js, *.css): don't trigger most
tests, especially GPU and cross-platform tests.
--> Tutorials that have GPU requirements run their own validation
tests, and tutorials that don't have GPU requirement don't get tested
on GPUs.

Cheers,
Aaron



On Wed, Feb 12, 2020 at 10:12 AM Davydenko, Denis
<dzianis.davydze...@gmail.com> wrote:
>
> Hello, MXNet dev community,
> As you all know, the experience with CI infrastructure isn’t ideal in spite 
> of its high cost. For this reason, we’re proposing the following changes to 
> improve stability, reduce cost, and grant more control to contributors. As we 
> work in a refresh of CI, we believe these changes will reduce the pain we all 
> suffer when we try to push a PR through the system.
>
> Following is the list of changes:
> Fix missing status reports between GH and Jenkins
> Update Jenkins permission groups to re-trigger builds
> Introduce per-PR CI bot
> Details:
>
> - Fix missing status reports
> Currently, once commit gets added to PR - the CI is run on that added commit. 
> Sometimes, CI run status is missing from the commit in Github despite having 
> completed in Jenkins. Example: CI run: 
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline,
>  commit status in github (missing unix-cpu, unix-gpu and windows-gpu 
> statuses): 
> https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging.
> Problem: There seems to be a bug where some status reports are missing on 
> Github. The hypothesis is that there is some issue with Github Hooks.
>
> - Update Jenkins permission groups to re-trigger builds
> Problem: Currently, only MXNet Committers and selected people from AWS have 
> the ability to re-trigger CI runs on PRs. This leaves the PR Authors waiting 
> for authorized users to re-trigger their PRs for them.
> Solution : Allow these membership categories Jenkins Admins, MXNet 
> Committers, and PR Authors to re-trigger PR builds.
>
> - Introduce per-PR CI bot
> Problem: As of date, MXNet CI is automated. It runs every time a commit is 
> pushed onto your Github PR. This results in lot of unnecessary CI runs apart 
> from added costs.
> Solution: Switch to Manual Trigger. Users from authorized groups (1 of the 3 
> categories mentioned above) can trigger CI run by adding a simple comment to 
> PR: “[mxnet-ci] run”.
>
> --
> Thank you,
>
> AWS MXNet team
>
>
>

Reply via email to