Hi Denis,

pipeline may be the wrong word, job may be the correct one. For example,
commiters can currently access a job page like 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17521/5/
 , press "Login" and then the restart button to only retrigger that job,
obtaining 
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-17521/6/

This is correctly reported to Github and the status will change from failed to
passed once depending on the result of the new job.

Best regards
Leonard

On Wed, 2020-02-12 at 20:23 +0000, Davydenko, Denis wrote:
> This might or might not work given that GH PR is failed or not given overall
> CI run status, not just few builds from it. But it is a good suggestion to try
> out, we will evaluate whether it could be accomplished. Thanks!
> 
> 
> 
> On 2/12/20, 11:05 AM, "Lausen, Leonard" <lau...@amazon.com.INVALID> wrote:
> 
>     Thank you Denis for taking up this initiative. With respect to "Introduce
> per-PR 
>     CI bot" and the "[mxnet-ci] run" command. Would it make sense to add
>     "retriggering only failed pipelines" to the scope? For example users could
> be
>     asked to specify the name of the pipeline, or have "[mxnet-ci] run all"
> and
>     "[mxnet-ci] run failed".
>     
>     In the current state, when retriggering all pipelines, it's likely that
> one of
>     them will fail. Only by retriggering the failed pipeline alone there is a
> higher
>      chance to arrive at a state where all pipelines have succeeded.
>     
>     On Wed, 2020-02-12 at 10:12 -0800, Davydenko, Denis wrote:
>     > Hello, MXNet dev community,
>     > As you all know, the experience with CI infrastructure isn’t ideal in
> spite of
>     > its high cost. For this reason, we’re proposing the following changes to
>     > improve stability, reduce cost, and grant more control to contributors.
> As we
>     > work in a refresh of CI, we believe these changes will reduce the pain
> we all
>     > suffer when we try to push a PR through the system.
>     > 
>     > Following is the list of changes:
>     > Fix missing status reports between GH and Jenkins
>     > Update Jenkins permission groups to re-trigger builds
>     > Introduce per-PR CI bot
>     > Details:
>     > 
>     > - Fix missing status reports
>     > Currently, once commit gets added to PR - the CI is run on that added
> commit.
>     > Sometimes, CI run status is missing from the commit in Github despite
> having
>     > completed in Jenkins. Example: CI run: 
>     > 
> http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-17376/17/pipeline
>     > , commit status in github (missing unix-cpu, unix-gpu and windows-gpu
>     > statuses): 
>     > 
> https://github.com/apache/incubator-mxnet/pull/17376#partial-pull-merging.
>     > Problem: There seems to be a bug where some status reports are missing
> on
>     > Github. The hypothesis is that there is some issue with Github Hooks.
>     > 
>     > - Update Jenkins permission groups to re-trigger builds
>     > Problem: Currently, only MXNet Committers and selected people from AWS
> have
>     > the ability to re-trigger CI runs on PRs. This leaves the PR Authors
> waiting
>     > for authorized users to re-trigger their PRs for them.
>     > Solution : Allow these membership categories Jenkins Admins, MXNet
> Committers,
>     > and PR Authors to re-trigger PR builds.
>     > 
>     > - Introduce per-PR CI bot
>     > Problem: As of date, MXNet CI is automated. It runs every time a commit
> is
>     > pushed onto your Github PR. This results in lot of unnecessary CI runs
> apart
>     > from added costs.
>     > Solution: Switch to Manual Trigger. Users from authorized groups (1 of
> the 3
>     > categories mentioned above) can trigger CI run by adding a simple
> comment to
>     > PR: “[mxnet-ci] run”. 
>     > 
>     > --
>     > Thank you,
>     > 
>     > AWS MXNet team
>     > 
>     >  
>     > 
>     
> 

Reply via email to