Dear Pulsar community members,

Currently, the Pulsar GitHub Actions workflows are consuming the majority
of the shared pool of resources allocated for github.com/apache projects.
Other Apache projects have been impacted and there is a demand to improve
the Pulsar CI
<https://github.com/apache/pulsar/pull/9159#issuecomment-766915396> asap.

In GitHub Actions Runners, the unit of resources is the time that a Runner
is occupied. I observed the workflow runs for handling a single Pull
Request (in my personal fork) and these were the running durations:
Workflow name Duration
CI - Build - MacOS 0:17:23
CI - Go Functions style check 0:02:38
CI - Unit - Brokers - Other 0:15:40
CI - Unit - Brokers - Client Impl 0:16:28
CI - Misc 0:16:51
CI - Unit - Proxy 0:14:23
CI - Go Functions Tests 0:22:08
CI - CPP, Python Tests 0:23:30
CI - Unit 0:42:11
CI - Integration - Sql 1:00:13
CI - Integration - Tiered JCloud 1:00:18
CI - Integration - Tiered FileSystem 1:00:13
CI - Integration - Function State 1:00:12
CI - Integration - Cli 1:10:22
CI - Integration - Transaction 1:16:34
CI - Integration - Process 1:11:23
CI - Shade - Test 1:15:45
CI - Unit - Brokers - Client Api 0:26:13
CI - Unit - Brokers - Broker Group 2 0:35:05
CI - Integration - Standalone 0:45:29
CI - Integration - Messaging 1:00:23
CI - Integration - Thread 1:00:19
CI - Integration - Backwards Compatibility 1:00:19
CI - Integration - Schema 1:00:19
CI - Unit - Brokers - Broker Group 1 2:02:31
TOTAL 19:36:50

*In this case, the total resource consumption of GitHub Actions Runners is
19 hours 36 minutes 50 seconds for a single pull request to apache/pulsar.*

Since GitHub Actions Runner resource pool utilization is very high, this
leads to the build queue to grow and take a long time to process.

I have been looking for ways to improve the Pulsar CI for the last 3
months. During this period I worked on a few experiments. The learnings
from the past experiments are documented at a high level in the following
draft PIP document.

*The draft PIP "Changes to GitHub Actions based Pulsar CI" document is a
Google doc:*
https://docs.google.com/document/d/1FNEWD3COdnNGMiryO9qBUW_83qtzAhqjDI5wwmPD-YE/edit?usp=sharing

*Please participate* so that we get the plan adjusted based on the feedback
asap. If there's already a similar effort ongoing, I hope we can join
efforts.

*Let's fix Pulsar CI!*

BR, Lari

Reply via email to