This is an automated email from the ASF dual-hosted git repository.
areusch pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tvm.git
The following commit(s) were added to refs/heads/main by this push:
new 892ab13112 Move jenkins/ dir into ci/jenkins and spread docs around.
(#11927)
892ab13112 is described below
commit 892ab131129da96e142bcdee48e41757560e686b
Author: Andrew Reusch <[email protected]>
AuthorDate: Tue Jun 28 16:05:36 2022 -0700
Move jenkins/ dir into ci/jenkins and spread docs around. (#11927)
---
Jenkinsfile | 2 +-
ci/README.md | 97 +++++++++++++++++++++
ci/jenkins/.gitignore | 1 +
{jenkins => ci/jenkins}/Build.groovy.j2 | 0
{jenkins => ci/jenkins}/Deploy.groovy.j2 | 0
{jenkins => ci/jenkins}/DockerBuild.groovy.j2 | 0
{jenkins => ci/jenkins}/Jenkinsfile.j2 | 14 +--
{jenkins => ci/jenkins}/Lint.groovy.j2 | 0
ci/jenkins/Makefile | 27 ++++++
{jenkins => ci/jenkins}/Prepare.groovy.j2 | 0
{jenkins => ci/jenkins}/README.md | 117 +++++--------------------
{jenkins => ci/jenkins}/Test.groovy.j2 | 0
{jenkins => ci/jenkins}/generate.py | 8 +-
{jenkins => ci/jenkins}/macros.j2 | 0
{jenkins => ci/jenkins}/requirements.txt | 0
docs/contribute/ci.rst | 119 ++++++++++++++++++++++++--
docs/contribute/code_guide.rst | 17 ++++
tests/scripts/open_docker_update_pr.py | 4 +-
tests/scripts/task_lint.sh | 3 +-
19 files changed, 288 insertions(+), 121 deletions(-)
diff --git a/Jenkinsfile b/Jenkinsfile
index 3f82ff1840..07c7f0c44a 100755
--- a/Jenkinsfile
+++ b/Jenkinsfile
@@ -45,7 +45,7 @@
// 'python3 jenkins/generate.py'
// Note: This timestamp is here to ensure that updates to the Jenkinsfile are
// always rebased on main before merging:
-// Generated at 2022-06-22T10:07:00.173803
+// Generated at 2022-06-27T17:30:37.779354
import org.jenkinsci.plugins.pipeline.modeldefinition.Utils
// NOTE: these lines are scanned by docker/dev_common.sh. Please update the
regex as needed. -->
diff --git a/ci/README.md b/ci/README.md
new file mode 100644
index 0000000000..a5cb39016b
--- /dev/null
+++ b/ci/README.md
@@ -0,0 +1,97 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements. See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership. The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License. You may obtain a copy of the License at -->
+
+<!--- http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied. See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+# Apache TVM Continuous Integration (CI)
+
+## Overview
+
+TVM's Continuous Integration is responsible for verifying the code in
`apache/tvm` and testing PRs
+before they merge to inform TVM contributors and committers. These jobs are
essential to keeping the
+TVM project in a healthy state and preventing breakages. CI in TVM is broken
into these pieces:
+ - Lint scripts in [`tests/lint`](../tests/lint).
+ - The tests themselves, all of which live underneath [`tests`](../tests).
+ - Definitions of test suites, with each suite defined as a separate `task_`
script in
+ [`tests/scripts`](../tests/scripts).
+ - The linux test sequence (in [`Jenkinsfile`](../Jenkinsfile)), which lints
and builds TVM and runs test
+ suites using Docker on Linux.
+ - The Windows and Mac test sequences (in
[`.github/actions`](../.github/actions)).
+ - GitHub Actions that support the code review process (in
[`.github/actions`](../.github/actions)).
+ - Tools to reproduce the CI locally (in `tests/scripts`).
+ - Infrastructure-as-Code that configures the cloud services that provide
Jenkins for the TVM CI (in the
+ [`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo).
+
+## CI Documentation Index
+
+The CI documentation belongs with the implementation it describes. To make
that concrete, the
+documentation is split like so:
+1. An overview of the CI is in this file.
+1. User-facing documentation lives in `apache/tvm`'s `docs/contribute`
sub-directory and is served on the
+ [TVM docs site](https://tvm.apache.org/docs/contribute/ci.html).
+2. Documentation of the tools that run TVM's various regression tests locally
and the test suites
+ are in this sub-directory.
+3. Documentation of the cloud services and their configuration lives in the
+ [`tlc-pack/ci`](https://github.com/tlc-pack/ci) repo.
+
+## Jenkins
+
+Jenkins runs all of the linux-based TVM CI-enabled regression tests. This
includes tests against accelerated hardware such as GPUs. It excludes those
regression tests that run against hardware not available in the cloud (those
tests aren't currently exercised in TVM CI). The tests run by Jenkins represent
most of the merge-blocking tests (and passing Jenkins should mostly correlate
with passing the remaining Windows/Mac builds).
+
+## GitHub Actions
+
+GitHub Actions is used to run Windows jobs, MacOS jobs, and various on-GitHub
automations. These are defined in [`.github/workflows`](../.github/workflows/).
These automations include bots to:
+* [cc people based on subscribed
teams/topics](https://github.com/apache/tvm/issues/10317)
+* [allow non-committers to merge approved / CI passing
PRs](https://discuss.tvm.apache.org/t/rfc-allow-merging-via-pr-comments/12220)
+* [add cc-ed people as reviewers on
GitHub](https://discuss.tvm.apache.org/t/rfc-remove-codeowners/12095)
+* [ping languishing PRs after no activity for a week (currently opt-in
only)](https://github.com/apache/tvm/issues/9983)
+* [push a `last-successful` branch to GitHub with the last `main` commit that
passed CI](https://github.com/apache/tvm/tree/last-successful)
+
+https://github.com/apache/tvm/actions has the logs for each of these
workflows. Note that when debugging these workflows changes from PRs from
forked repositories won't be reflected in the PR. These should be tested in the
forked repository first and linked in the PR body.
+
+## Docker Images
+
+Each CI job runs most of its work inside a Docker container, built from files
+in the [`docker/`](../docker) folder. These
+files are built nightly in Jenkins via the
[docker-images-ci](https://ci.tlcpack.ai/job/docker-images-ci/>) job.
+The images for these containers are hosted in the [tlcpack Docker
Hub](https://hub.docker.com/u/tlcpack>)
+and referenced in the [`Jenkinsfile.j2`](Jenkinsfile.j2). These can be
inspected and run
+locally via standard Docker commands.
+
+### `ci-docker-staging`
+
+The [ci-docker-staging](https://github.com/apache/tvm/tree/ci-docker-staging>)
+branch is used to test updates to Docker images and `Jenkinsfile` changes. When
+running a build for a normal PR from a forked repository, Jenkins uses the code
+from the PR except for the `Jenkinsfile` itself, which comes from the base
branch.
+When branches are built, the `Jenkinsfile` in the branch is used, so a
committer
+with write access must push PRs to a branch in apache/tvm to properly test
+`Jenkinsfile` changes. If your PR makes changes to the `Jenkinsfile`, make sure
+to @ a [committer](../CONTRIBUTORS.md>)
+and ask them to push your PR as a branch to test the changes.
+
+# Jenkins CI
+
+TVM uses Jenkins for running Linux continuous integration (CI) tests on
+[branches](https://ci.tlcpack.ai/job/tvm/) and
+[pull requests](https://ci.tlcpack.ai/job/tvm/view/change-requests/) through a
+build configuration specified in a [`Jenkinsfile`](../Jenkinsfile).
+Other jobs run in GitHub Actions for Windows and MacOS jobs.
+
+## `Jenkinsfile`
+
+The template files in this directory are used to generate the
[`Jenkinsfile`](../Jenkinsfile) used by Jenkins to run CI jobs for each commit
to PRs and branches.
+
+To regenerate the `Jenkinsfile`, run `make` in the `ci/jenkins` dir.
diff --git a/ci/jenkins/.gitignore b/ci/jenkins/.gitignore
new file mode 100644
index 0000000000..187a72392c
--- /dev/null
+++ b/ci/jenkins/.gitignore
@@ -0,0 +1 @@
+/_venv
\ No newline at end of file
diff --git a/jenkins/Build.groovy.j2 b/ci/jenkins/Build.groovy.j2
similarity index 100%
rename from jenkins/Build.groovy.j2
rename to ci/jenkins/Build.groovy.j2
diff --git a/jenkins/Deploy.groovy.j2 b/ci/jenkins/Deploy.groovy.j2
similarity index 100%
rename from jenkins/Deploy.groovy.j2
rename to ci/jenkins/Deploy.groovy.j2
diff --git a/jenkins/DockerBuild.groovy.j2 b/ci/jenkins/DockerBuild.groovy.j2
similarity index 100%
rename from jenkins/DockerBuild.groovy.j2
rename to ci/jenkins/DockerBuild.groovy.j2
diff --git a/jenkins/Jenkinsfile.j2 b/ci/jenkins/Jenkinsfile.j2
similarity index 93%
rename from jenkins/Jenkinsfile.j2
rename to ci/jenkins/Jenkinsfile.j2
index 0a83549da1..6f2f6a4370 100644
--- a/jenkins/Jenkinsfile.j2
+++ b/ci/jenkins/Jenkinsfile.j2
@@ -48,7 +48,7 @@
// Generated at {{ generated_time }}
import org.jenkinsci.plugins.pipeline.modeldefinition.Utils
-{% import 'jenkins/macros.j2' as m with context -%}
+{% import 'ci/jenkins/macros.j2' as m with context -%}
// NOTE: these lines are scanned by docker/dev_common.sh. Please update the
regex as needed. -->
ci_lint = 'tlcpack/ci-lint:20220513-055910-fa834f67e'
@@ -106,12 +106,12 @@ s3_prefix =
"tvm-jenkins-artifacts-prod/tvm/${env.BRANCH_NAME}/${env.BUILD_NUMBE
// General note: Jenkins has limits on the size of a method (or top level code)
// that are pretty strict, so most usage of groovy methods in these templates
// are purely to satisfy the JVM
-{% include "jenkins/Prepare.groovy.j2" %}
-{% include "jenkins/DockerBuild.groovy.j2" %}
-{% include "jenkins/Lint.groovy.j2" %}
-{% include "jenkins/Build.groovy.j2" %}
-{% include "jenkins/Test.groovy.j2" %}
-{% include "jenkins/Deploy.groovy.j2" %}
+{% include "ci/jenkins/Prepare.groovy.j2" %}
+{% include "ci/jenkins/DockerBuild.groovy.j2" %}
+{% include "ci/jenkins/Lint.groovy.j2" %}
+{% include "ci/jenkins/Build.groovy.j2" %}
+{% include "ci/jenkins/Test.groovy.j2" %}
+{% include "ci/jenkins/Deploy.groovy.j2" %}
cancel_previous_build()
diff --git a/jenkins/Lint.groovy.j2 b/ci/jenkins/Lint.groovy.j2
similarity index 100%
rename from jenkins/Lint.groovy.j2
rename to ci/jenkins/Lint.groovy.j2
diff --git a/ci/jenkins/Makefile b/ci/jenkins/Makefile
new file mode 100644
index 0000000000..5c9e0ac540
--- /dev/null
+++ b/ci/jenkins/Makefile
@@ -0,0 +1,27 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+_venv: requirements.txt
+ rm -rf _venv
+ python3 -mvenv _venv
+ _venv/bin/pip3 install -r requirements.txt
+
+all: _venv
+ _venv/bin/python3 generate.py
+
+.PHONY: all venv
+.DEFAULT_GOAL=all
diff --git a/jenkins/Prepare.groovy.j2 b/ci/jenkins/Prepare.groovy.j2
similarity index 100%
rename from jenkins/Prepare.groovy.j2
rename to ci/jenkins/Prepare.groovy.j2
diff --git a/jenkins/README.md b/ci/jenkins/README.md
similarity index 62%
rename from jenkins/README.md
rename to ci/jenkins/README.md
index d06672518a..d2a29838b6 100644
--- a/jenkins/README.md
+++ b/ci/jenkins/README.md
@@ -17,7 +17,11 @@
# TVM CI
-TVM runs CI jobs on every commit to an open pull request and to branches in
the apache/tvm repo (such as `main`). These jobs are essential to keeping the
TVM project in a healthy state and preventing breakages. Jenkins does most of
the work in running the TVM tests, though some smaller jobs are also run on
GitHub Actions.
+TVM runs CI jobs on every commit to an open pull request and to branches in
the apache/tvm repo (such as `main`). These jobs are essential to keeping the
TVM project in a healthy state and preventing breakages.
+
+## Jenkins
+
+Jenkins runs all of the linux-based TVM CI-enabled regression tests. This
includes tests against accelerated hardware such as GPUs. It excludes those
regression tests that run against hardware not available in the cloud (those
tests aren't currently exercised in TVM CI). The tests run by Jenkins represent
most of the merge-blocking tests (and passing Jenkins should mostly correlate
with passing the remaining Windows/Mac builds).
## GitHub Actions
@@ -33,17 +37,20 @@ https://github.com/apache/tvm/actions has the logs for each
of these workflows.
## Keeping CI Green
-Developers rely on the TVM CI to get signal on their PRs before merging.
-Occasionally breakages slip through and break `main`, which in turn causes
-the same error to show up on an PR that is based on the broken commit(s).
Broken
-commits can be identified [through
GitHub](https://github.com/apache/tvm/commits/main>)
-via the commit status icon or via
[Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>).
-In these situations it is possible to either revert the offending commit or
-submit a forward fix to address the issue. It is up to the committer and commit
-author which option to choose, keeping in mind that a broken CI affects all TVM
-developers and should be fixed as soon as possible.
+Developers rely on the TVM CI to get signal on their PRs before merging.
Occasionally breakages
+slip through and break `main`, which in turn causes the same error to show up
on an unrelated PR
+that is based on the broken commit(s). Broken commits can be identified
[through
+GitHub](https://github.com/apache/tvm/commits/main>) via the commit status
icon or via
+[Jenkins](https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>).
In these
+situations it is possible to either revert the offending commit or submit a
forward fix to address
+the issue. It is up to the committer and commit author which option to choose.
A broken CI affects
+all TVM developers and should be fixed as soon as possible, while a revert may
be especially painful
+for the author of the offending PR when that PR is large.
-Some tests are also flaky and fail for reasons unrelated to the PR. The [CI
monitoring rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook)
watches for these failures and disables tests as necessary. It is the
responsibility of those who wrote the test to ultimately fix and re-enable the
test.
+Some tests are also flaky and occasionally fail for reasons unrelated to the
PR. The [CI monitoring
+rotation](https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook) watches
for these failures and
+disables tests as necessary. It is the responsibility of those who wrote the
test to ultimately fix
+and re-enable the test.
## Dealing with Flakiness
@@ -85,7 +92,7 @@ a name, hash, and path in S3, using the `workflow_dispatch`
event on
The sha256 must match the file or it will not be uploaded. The upload path is
user-defined so it can be any path (no trailing or leading slashes allowed) but
be careful not to collide with existing resources on accident.
-
+
## Skipping CI
For reverts and trivial forward fixes, adding `[skip ci]` to the revert's
@@ -153,88 +160,4 @@ _venv/bin/python3 jenkins/generate.py
# Infrastructure
-Jenkins runs in AWS on an EC2 instance fronted by an ELB which makes it
available at https://ci.tlcpack.ai. These definitions are declared via
Terraform in the
[tlc-pack/ci-terraform](https://github.com/tlc-pack/ci-terraform) repository.
The Terraform code references custom AMIs built in
[tlc-pack/ci-packer](https://github.com/tlc-pack/ci-packer).
[tlc-pack/ci](https://github.com/tlc-pack/ci) contains Ansible scripts to
deploy the Jenkins head node and set it up to interact with AWS.
-
-The Jenkins head node has a number of autoscaling groups with labels that are
used to run jobs (e.g. `CPU`, `GPU` or `ARM`) via the [EC2
Fleet](https://plugins.jenkins.io/ec2-fleet/) plugin.
-
-## Deploying
-
-Deploying Jenkins can disrupt developers so it must be done with care. Jobs
that are in-flight will be cancelled and must be manually restarted. Follow the
instructions [here](https://github.com/tlc-pack/ci/issues/10) to run a deploy.
-
-## Monitoring
-
-Dashboards of CI data can be found:
-* within Jenkins at https://ci.tlcpack.ai/monitoring (HTTP / JVM stats)
-* at https://monitoring.tlcpack.ai (job status, worker status)
-
-## CI Diagram
-
-This details the individual parts that interact in TVM's CI. For details on
operations, see https://github.com/tlc-pack/ci.
-
-```mermaid
-graph TD
- Commit --> GitHub
- GitHub --> |`push` webhook| WebhookServer(Webhook Server)
- JobExecutor(Job Executor)
- WebhookServer --> JobExecutor
- JobExecutor --> EC2Fleet(EC2 Fleet Plugin)
- EC2Fleet --> |capacity request| EC2(EC2 Autoscaler)
- JobExecutor --> WorkerEC2Instance
- Docker --> |build cache, artifacts| S3
- WorkerEC2Instance --> Docker
- Docker --> |docker pull| G(Docker Hub)
- Docker --> |docker push / pull| ECR
- Docker --> |Execute jobs| CIScripts(CI Scripts)
- RepoCITerraform(ci-terraform repo) --> |terraform| ECR
- RepoCITerraform(ci-terraform repo) --> |terraform| EC2
- RepoCITerraform(ci-terraform repo) --> |terraform| S3
- RepoCI(ci repo) --> |configuration via Ansible| WorkerEC2Instance
- RepoCIPacker(ci-packer) --> |AMIs| EC2
- Monitoring_Scrapers(Jenkins Scraper) --> Monitoring_DB(Postrgres)
- Grafana --> Monitoring_DB
- GitHub --> Windows
- GitHub --> MacOS
-
- Developers --> |check PR status|JenkinsUI(Jenkins Web UI)
- Monitoring_Scrapers --> |fetch job data| JenkinsUI
- Developers --> |git push| Commit
- Developers --> |create PR| GitHub
-
- subgraph Jenkins Head Node
- WebhookServer
- JobExecutor
- EC2Fleet
- JenkinsUI
- end
-
- subgraph GitHub Actions
- Windows
- MacOS
- end
-
- subgraph Configuration / Terraform
- RepoCITerraform
- RepoCI
- RepoCIPacker
- end
-
- subgraph Monitoring
- Monitoring_DB
- Grafana
- Monitoring_Scrapers
- end
-
- subgraph AWS
- subgraph Jenkins Workers
- WorkerEC2Instance(Worker EC2 Instance)
- subgraph "Worker EC2 Instance"
- Docker
- CIScripts
- end
- end
- EC2
- ECR
- S3
- end
-
-```
+While all TVM tests are contained within the apache/tvm repository, the
infrastructure used to run the tests is donated by the TVM Community. To
encourage collaboration, the configuration for TVM's CI infrastructure is
stored in a public GitHub repository. TVM community members are encouraged to
contribute improvements. The configuration, along with documentation of TVM's
CI infrastructure, is in the [tlc-pack/ci](https://github.com/tlc-pack/ci) repo.
diff --git a/jenkins/Test.groovy.j2 b/ci/jenkins/Test.groovy.j2
similarity index 100%
rename from jenkins/Test.groovy.j2
rename to ci/jenkins/Test.groovy.j2
diff --git a/jenkins/generate.py b/ci/jenkins/generate.py
similarity index 96%
rename from jenkins/generate.py
rename to ci/jenkins/generate.py
index ba7f165925..686e44e14d 100644
--- a/jenkins/generate.py
+++ b/ci/jenkins/generate.py
@@ -25,8 +25,8 @@ import textwrap
from pathlib import Path
-REPO_ROOT = Path(__file__).resolve().parent.parent
-JENKINSFILE_TEMPLATE = REPO_ROOT / "jenkins" / "Jenkinsfile.j2"
+REPO_ROOT = Path(__file__).resolve().parent.parent.parent
+JENKINSFILE_TEMPLATE = REPO_ROOT / "ci" / "jenkins" / "Jenkinsfile.j2"
JENKINSFILE = REPO_ROOT / "Jenkinsfile"
@@ -111,10 +111,10 @@ if __name__ == "__main__":
Newly generated Jenkinsfile did not match the one on disk! If
you have made
edits to the Jenkinsfile, move them to
'jenkins/Jenkinsfile.j2' and
regenerate the Jenkinsfile from the template with
-
+
python3 -m pip install -r jenkins/requirements.txt
python3 jenkins/generate.py
-
+
Diffed changes:
"""
).strip()
diff --git a/jenkins/macros.j2 b/ci/jenkins/macros.j2
similarity index 100%
rename from jenkins/macros.j2
rename to ci/jenkins/macros.j2
diff --git a/jenkins/requirements.txt b/ci/jenkins/requirements.txt
similarity index 100%
rename from jenkins/requirements.txt
rename to ci/jenkins/requirements.txt
diff --git a/docs/contribute/ci.rst b/docs/contribute/ci.rst
index 0cc1bf9dd9..9a2876220f 100644
--- a/docs/contribute/ci.rst
+++ b/docs/contribute/ci.rst
@@ -23,14 +23,21 @@ Using TVM's CI
.. contents::
:local:
-TVM uses Jenkins for running Linux continuous integration (CI) tests on
-`branches <https://ci.tlcpack.ai/job/tvm/>`_ and
+TVM primarily uses Jenkins for running Linux continuous integration (CI) tests
on
+`branches <https://ci.tlcpack.ai/job/tvm/>`_
`pull requests <https://ci.tlcpack.ai/job/tvm/view/change-requests/>`_ through
a
build configuration specified in a `Jenkinsfile
<https://github.com/apache/tvm/blob/main/Jenkinsfile>`_.
-Non-critical jobs run in GitHub Actions for Windows and MacOS jobs.
+Jenkins is the only CI step that is codified to block merging. TVM is also
tested minimally
+against Windows and MacOS using GitHub Actions.
+
+This page describes how contributors and committers can use TVM's CI to verify
their code. You can
+read more about the design of TVM CI in the
+
+For Contributors
+----------------
A standard CI run looks something like this viewed in `Jenkins' BlueOcean
viewer <https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity>`_.
-CI runs usually take several hours to complete and pull requests (PRs) cannot
be merged before CI
+CI runs usually take a couple hours to complete and pull requests (PRs) cannot
be merged before CI
has successfully completed. To diagnose failing steps, click through to the
failing
pipeline stage then to the failing step to see the output logs.
@@ -40,12 +47,12 @@ pipeline stage then to the failing step to see the output
logs.
Debugging Failures
-******************
+^^^^^^^^^^^^^^^^^^
When CI fails for some reason, there are several methods to diagnose the issue.
Jenkins Logs
-------------
+""""""""""""
.. |pytest| replace:: ``pytest``
.. _pytest: https://docs.pytest.org/en/6.2.x/
@@ -59,13 +66,109 @@ the failing job to view the logs. Note:
need to scroll up to view the actual failure.
Reproduce Failures
-------------------
+""""""""""""""""""
Most TVM Python tests run under |pytest|_ and can be run as described in
:ref:`pr-testing`.
Reporting Issues
-****************
+^^^^^^^^^^^^^^^^
Issues with CI should be `reported on GitHub
<https://github.com/apache/tvm/issues/new?assignees=&labels=&template=ci-problem.md&title=%5BCI+Problem%5D+>`_
with a link to the relevant jobs, commits, or PRs.
+
+
+
+For Maintainers
+---------------
+
+This section discusses processes ran by TVM Maintainers.
+
+
+Procedures for Keeping CI Green
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This section talks about common procedures used to keep CI passing.
+
+Broken CI due to Simultaneous Merge
+"""""""""""""""""""""""""""""""""""
+
+Developers rely on the TVM CI to get signal on their PRs before merging.
Occasionally, two
+different PRs can pass CI individually but break ``main`` when both land.
This in turn causes an
+error to show up on an unrelated PR that is based on the broken commit(s).
Broken commits can be
+identified `through GitHub <https://github.com/apache/tvm/commits/main>`_ via
the commit status icon
+or via `Jenkins
<https://ci.tlcpack.ai/blue/organizations/jenkins/tvm/activity?branch=main>`_.
+
+In these situations it is ultimately the responsibility of the TVM Committer
who merged the PR to
+fix CI (others are encouraged to help). Typical responses to this situation
are:
+1. revert the offending commit
+2. submit a forward fix to address the issue.
+
+It is up to the committer and commit author which option to choose. A broken
CI affects all TVM
+developers and should be fixed as soon as possible, while a revert may be
especially painful for the
+author of the offending PR when that PR is large.
+
+
+Dealing with Flakiness
+^^^^^^^^^^^^^^^^^^^^^^
+
+If you notice a failure on your PR that seems unrelated to your change, you
should
+search [recent GitHub issues related to flaky
tests](https://github.com/apache/tvm/issues?q=is%3Aissue+%5BCI+Problem%5D+Flaky+>)
and
+[file a new
issue](https://github.com/apache/tvm/issues/new?assignees=&labels=&template=ci-problem.md&title=%5BCI+Problem%5D+>)
+if you don't see any reports of the failure. If a certain test or class of
tests affects
+several PRs or commits on `main` with flaky failures, the test should be
disabled via
+[pytest's @xfail
decorator](https://docs.pytest.org/en/6.2.x/skipping.html#xfail-mark-test-functions-as-expected-to-fail)
with
[`strict=False`](https://docs.pytest.org/en/6.2.x/skipping.html#strict-parameter)
and the relevant issue linked in the
+disabling PR.
+
+.. code-block:: python
+
+ @pytest.mark.xfail(strict=False, reason="Flaky test:
https://github.com/apache/tvm/issues/1234")
+ def test_something_flaky():
+ pass
+
+Then submit a PR as usual
+
+.. code-block:: bash
+
+ git add <test file>
+ git commit -m'[skip ci][ci] Disable flaky test: ``<test_name>``
+
+ See #<issue number>
+ '
+ gh pr create
+
+
+Skipping CI
+^^^^^^^^^^^
+
+For reverts and trivial forward fixes, adding ``[skip ci]`` to the revert's
+PR title will cause CI to shortcut and only run lint. Committers should
+take care that they only merge CI-skipped PRs to fix a failure on ``main`` and
+not in cases where the submitter wants to shortcut CI to merge a change faster.
+The PR title is checked when the build is first run (specifically during the
lint
+step, so changes after that has run do not affect CI and will require the job
to
+be re-triggered by another ``git push``).
+
+.. code-block:: bash
+
+ # Revert HEAD commit, make sure to insert '[skip ci]' at the beginning of
+ # the commit subject
+ git revert HEAD
+ git checkout -b my_fix
+ # After you have pushed your branch, create a PR as usual.
+ git push my_repo
+ # Example: Skip CI on a branch with an existing PR
+ # Adding this commit to an existing branch will cause a new CI run where
+ # Jenkins is skipped
+ git commit --allow-empty --message "[skip ci] Trigger skipped CI"
+ git push my_repo
+
+
+
+CI Monitoring Rotation
+^^^^^^^^^^^^^^^^^^^^^^
+
+Some tests are also flaky and occasionally fail for reasons unrelated to the
PR. The
+`CI monitoring rotation
<https://github.com/apache/tvm/wiki/CI-Monitoring-Runbook>`_ watches for these
failures and
+disables tests as necessary. It is the responsibility of those who wrote the
test to ultimately fix
+and re-enable the test.
diff --git a/docs/contribute/code_guide.rst b/docs/contribute/code_guide.rst
index 3849b795f6..d404ba6379 100644
--- a/docs/contribute/code_guide.rst
+++ b/docs/contribute/code_guide.rst
@@ -139,6 +139,23 @@ If you want your test to run over a variety of targets,
use the :py:func:`tvm.te
will run ``test_mytest`` with ``target="llvm"``, ``target="cuda"``, and few
others. This also ensures that your test is run on the correct hardware by the
CI. If you only want to test against a couple targets use
``@tvm.testing.parametrize_targets("target_1", "target_2")``. If you want to
test on a single target, use the associated decorator from
:py:func:`tvm.testing`. For example, CUDA tests use the
``@tvm.testing.requires_cuda`` decorator.
+
+Network Resources
+-----------------
+
+In CI, downloading files from the Internet is a big source of flaky test
failures (e.g. remote
+server can go down or be slow), so try to avoid using the network at all
during tests. In some cases
+this isn't a reasonable proposition (e.g. the docs tutorials which need to
download models).
+
+In these cases you can re-host files in S3 for fast access in CI. A committer
can upload a file,
+specified by a name, hash, and path in S3, using the `workflow_dispatch` event
on `the
+upload_ci_resource.yml GitHub Actions workflow
+<https://github.com/apache/tvm/actions/workflows/upload_ci_resource.yml>`_.
The sha256 must match
+the file or it will not be uploaded. The upload path is user-defined so it can
be any path (no
+trailing or leading slashes allowed) but be careful not to collide with
existing resources on
+accident.
+
+
Handle Integer Constant Expression
----------------------------------
We often need to handle constant integer expressions in TVM. Before we do so,
the first question we want to ask is that is it really necessary to get a
constant integer. If symbolic expression also works and let the logic flow, we
should use symbolic expression as much as possible. So the generated code works
for shapes that are not known ahead of time.
diff --git a/tests/scripts/open_docker_update_pr.py
b/tests/scripts/open_docker_update_pr.py
index 2f85a50461..f583f00d5c 100755
--- a/tests/scripts/open_docker_update_pr.py
+++ b/tests/scripts/open_docker_update_pr.py
@@ -28,9 +28,9 @@ from git_utils import git, parse_remote, GitHubRepo
from cmd_utils import REPO_ROOT, init_log, Sh
from should_rebuild_docker import docker_api
-JENKINSFILE = REPO_ROOT / "jenkins" / "Jenkinsfile.j2"
+JENKINSFILE = REPO_ROOT / "ci" / "jenkins" / "Jenkinsfile.j2"
GENERATED_JENKINSFILE = REPO_ROOT / "Jenkinsfile"
-GENERATE_SCRIPT = REPO_ROOT / "jenkins" / "generate.py"
+GENERATE_SCRIPT = REPO_ROOT / "ci" / "jenkins" / "generate.py"
GITHUB_TOKEN = os.environ["GITHUB_TOKEN"]
BRANCH = "nightly-docker-update"
diff --git a/tests/scripts/task_lint.sh b/tests/scripts/task_lint.sh
index 80cfc00ff7..a05f7ca36b 100755
--- a/tests/scripts/task_lint.sh
+++ b/tests/scripts/task_lint.sh
@@ -32,7 +32,7 @@ function shard1 {
tests/scripts/task_convert_scripts_to_python.sh
echo "Check Jenkinsfile generation"
- python3 jenkins/generate.py --check
+ python3 ci/jenkins/generate.py --check
echo "Checking file types..."
python3 tests/lint/check_file_type.py
@@ -90,4 +90,3 @@ else
shard1
shard2
fi
-