lhotari opened a new pull request #14819:
URL: https://github.com/apache/pulsar/pull/14819


   ### Motivation
   
   Improve Pulsar CI:
   
   - Reduce GitHub Action Runner resource consumption of Pulsar PR builds
     - Currently, Pulsar GitHub Actions workflows are consuming the majority of 
the shared pool of resources allocated for github.com/apache projects
     - Running the GitHub Actions workflows for a single PR to Pulsar consumes 
about 18-20 hours of GitHub Actions Runner VM time. This is too much.
   
   - Reduce lead times for Pull Request feedback by speeding up builds
     - Speeds up Pulsar development
     - Improves developer productivity since waiting times are reduced
     - Since PR feedback is faster, developers can be comfortable submitting 
more granular pull requests.
     - When development cycle is faster, it is easier to keep the pull request 
queue shorter. This has several benefits since when PRs are handled quickly, 
there are fewer chances for pull requests to divert from the master branch. It 
also reduces merge conflicts and the time wasted in resolving merge conflicts.
   
   - Better usability and access to test reports
     - Less time is spent in looking for the reason why a build failed
   
   ### Modifications
   
   - The design goal has been to keep the build content as the same as before 
the refactoring. The same tests are run, but in more effective ways. This 
refactoring doesn't make changes to the way how test retries are handled.
   
   - Combine most of the Pulsar CI workflows into a single workflow called 
"Pulsar CI"
     - The workflows that benefit of the aggregation have been chosen.
     - the modifications reuse binary artifacts in the workflow and this 
reduces the resource consumption.
       - Pulsar core modules jar files are built once and reused.
       - Pulsar docker images are built once and reused
       - GitHub Actions cache is used to share the files. The capacity of 
GitHub Actions cache is 10GB which is scoped to the developer who opens the 
pull request. This means that there's plenty of disk space for PR builds (10GB 
for each developer). 
   
   - Integration tests are categorized into "integration tests" and "system 
tests"
     - A slimmer docker image `apachepulsar/java-test-image:latest` is used to 
run the integration tests that don't depend on Pulsar Python client, Tiered 
storage drivers, Pulsar SQL or Pulsar Connectors.
     - The previous `apachepulsar/pulsar-test-latest-version:latest` image is 
used to run the integration tests that are categorized as "system tests".
     - The benefit of this split is that the java-test-image builds in about 6 
minutes and can start the downstream integration test jobs after this. This 
results in faster developer feedback.
   
   - For debugging builds, there's configuration for exposing ssh shell access 
to each Build VM to the user who triggered the build ("github actor"). The ssh 
access is authenticated with the SSH key that the user has registered in 
GitHub. 
     - ssh access is only active in own forks. It is not enabled in 
`apache/pulsar` because of security concerns. 
     - A developer can open a PR to their own fork (for example with a single 
command with GH cli `gh pr create --repo=githubusername/pulsar --base master 
--head "$(git branch --show-current)" -f`) to run the build with ssh access 
enabled.
     - ssh access is active for the duration of the build. If the build fails, 
the build waits 5 minutes for a developer to connect to investigate the 
problem. (this behavior is not enabled in `apache/pulsar`)
   
   The SSH shell access feature will make it easier to debug CI issues which 
don't get resolved with the information in the GitHub Actions UI. This is an 
important capability to have available whenever there are problems. As 
described above, the configuration requires to run the build in a developer's 
personal fork of the pulsar repository to activate the feature.
   
   - Fix broken configuration in `.github/actions/tune-runner-vm/action.yml` 
which was broken with PR #13252.
     - The makes Linux kernel's vm swappiness setting effectively `1` for all 
cgroups.
     - Helps prevent swapping when the VM is running low on memory.
   
   - Improve test reporting by the use of 
https://github.com/dorny/test-reporter . The test reports get attached to the 
wrong workflow because of a GitHub Actions limitation. That reduces the 
usability since the test reports are harder to find. test-reporter renders the 
Junit XML files to the GitHub Actions UI.
   
   - Improve test reporting by adding warning annotations about the test 
statistics.
     - not really warnings, but GitHub Actions doesn't seem to allow info 
annotations from shell scripts.
   
   - Use GitHub Action built-in feature to cancel duplicate build jobs:
   ```
   concurrency:
     group: ${{ github.workflow }}-${{ github.ref }}
     cancel-in-progress: true
   ```
     - a new push to a PR will trigger a new job and this feature will be used 
to cancel the previous build which is obsolete
     - this solution might be more effective than the current solution to 
cancel duplicate jobs
   
   
   ### Additional Context
   
   The work in this PR was mainly done last year while working on a 
proof-of-concept of the GitHub Actions refactoring.
   There's a Google document [[Discuss] PIP Changes to GitHub Actions based 
Pulsar 
CI](https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit#heading=h.f53rkcu20sry)
 which describes details about some technical solutions. There's also an [email 
thread on the dev mailing 
list](https://lists.apache.org/thread/ra2fcf7b973448bb51e00aceaeed06433e8d886270b0f0db0c80d4e0c@%3Cdev.pulsar.apache.org%3E).
   
   The showstopper a year ago was the lack of being able to re-run a single 
failed job in a larger workflow.
   GitHub has since then delivered this feature and no showstoppers are present.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to