(cloudberry) branch main updated: Improve CI reliability and developer productivity through test scheduling optimizations, mirror stability fixes, and a new artifact reuse feature. (#1379)

espino Fri, 17 Oct 2025 18:13:57 -0700

This is an automated email from the ASF dual-hosted git repository.

espino pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/cloudberry.git



The following commit(s) were added to refs/heads/main by this push:
     new d6b001ea91b Improve CI reliability and developer productivity through 
test scheduling optimizations, mirror stability fixes, and a new artifact reuse 
feature. (#1379)
d6b001ea91b is described below

commit d6b001ea91b0108fd834371d89a637b99328c502
Author: Ed Espino <[email protected]>
AuthorDate: Tue Oct 7 08:41:17 2025 -0700

    Improve CI reliability and developer productivity through test scheduling 
optimizations, mirror stability fixes, and a new artifact reuse feature. (#1379)
    
    * Document disk-intensive test placement in greenplum_schedule
    
    Add comment explaining why autovacuum-template0-segment and profile tests
    are positioned early in the test schedule. These tests consume significant
    disk space through WAL generation, XID consumption, and autovacuum 
operations.
    Running them early when ~20GB disk space is available (vs ~10GB later) helps
    avoid disk exhaustion issues during test execution.
    
    * Fix Rocky Linux mirror instability in CI
    
    Add repository metadata refresh and retry logic to handle transient
    mirror failures during RPM installation. This addresses frequent 404
    errors from Rocky Linux mirrors that cause CI failures.
    
    Changes:
    - Run 'dnf clean all' and 'dnf makecache --refresh' before installation
    - Add '--setopt=retries=10' to dnf install command
    - Apply fix to both rpm-install-test and test jobs
    
    This improves CI reliability without changing functionality.
    
    * Add artifact reuse feature for faster test iteration
    
    Enable reusing build artifacts from previous workflow runs to speed up
    test iteration by ~50-70 minutes. This is useful for debugging test
    failures without rebuilding.
    
    Changes:
    - Add 'reuse_artifacts_from_run_id' workflow input parameter
    - Skip build job when reusing artifacts from specified run
    - Skip rpm-install-test job when reusing artifacts
    - Update artifact download steps to support cross-run downloads
    - Add proper job conditionals to handle skipped build job
    
    Usage:
      Manually trigger workflow and specify a previous run ID in the
      'reuse_artifacts_from_run_id' input field. Leave empty to build fresh.
    
    This maintains backward compatibility - default behavior unchanged.
    
    * Add GitHub Actions workflow documentation for developers
    
    Create comprehensive documentation for GitHub Actions workflows, focusing
    on features that help developers iterate faster when debugging CI issues.
    
    Key sections:
    - Manual workflow triggers and input parameters
    - Artifact reuse feature with step-by-step guide
    - Running workflows in forked repositories
    - Troubleshooting common issues
    
    This documentation enables developers to:
    - Reuse build artifacts to save ~50-70 minutes per test iteration
    - Run CI validation in their forks before submitting PRs
    - Understand available workflow options and test selections
    - Debug test failures more efficiently
    
    * Pin Rocky Linux repos to stable 9.x release
    
    Use --releasever=9 to pin dnf to stable Rocky Linux 9.x repos instead
    of bleeding-edge point releases (e.g., 9.6) that may not be fully synced
    across all mirrors.
    
    Rocky Linux maintains binary compatibility within major versions, so
    pinning to 9 ensures we get stable, widely-mirrored packages while
    remaining compatible with the 9.6 container OS.
    
    This complements the earlier retry/refresh logic by addressing the root
    cause: new point releases have metadata sync lag across mirror network.
    
    * Move all autovacuum tests to early execution
    
    Move autovacuum and autovacuum-segment tests alongside
    autovacuum-template0-segment to run early in the schedule when more
    disk space is available.
    
    All three autovacuum tests are disk-intensive and benefit from running
    when ~20GB is available rather than later when space may be constrained.
    This grouping also improves test organization by keeping related tests
    together.
    
    * Clarify secrets configuration in workflow documentation
    
    Update README to clarify that no manual secret configuration is required
    for normal development workflows:
    
    - GITHUB_TOKEN is automatically provided by GitHub
    - Only used for artifact reuse feature (downloading previous run artifacts)
    - DockerHub secrets only needed for custom container image builds
      (advanced/maintainer use case)
    
    This removes confusion about required setup steps for fork users.
---
 .github/workflows/README.md            | 258 +++++++++++++++++++++++++++++++++
 .github/workflows/build-cloudberry.yml |  39 ++++-
 src/test/regress/greenplum_schedule    |  16 +-
 3 files changed, 303 insertions(+), 10 deletions(-)

diff --git a/.github/workflows/README.md b/.github/workflows/README.md
new file mode 100644
index 00000000000..ae1651742e0
--- /dev/null
+++ b/.github/workflows/README.md
@@ -0,0 +1,258 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# GitHub Actions Workflows
+
+This directory contains GitHub Actions workflows for Apache Cloudberry CI/CD.
+
+## Table of Contents
+
+- [Available Workflows](#available-workflows)
+- [Manual Workflow Triggers](#manual-workflow-triggers)
+- [Artifact Reuse for Faster Testing](#artifact-reuse-for-faster-testing)
+- [Running Workflows in Forked 
Repositories](#running-workflows-in-forked-repositories)
+
+## Available Workflows
+
+| Workflow | Purpose | Trigger |
+|----------|---------|---------|
+| `build-cloudberry.yml` | Main CI: build, test, create RPMs | Push, PR, 
Manual |
+| `build-dbg-cloudberry.yml` | Debug build with assertions enabled | Push, PR, 
Manual |
+| `apache-rat-audit.yml` | License header compliance check | Push, PR |
+| `coverity.yml` | Static code analysis with Coverity | Weekly, Manual |
+| `sonarqube.yml` | Code quality analysis with SonarQube | Push to main |
+| `docker-cbdb-build-containers.yml` | Build Docker images for CI | Manual |
+| `docker-cbdb-test-containers.yml` | Build test Docker images | Manual |
+
+## Manual Workflow Triggers
+
+Many workflows support manual triggering via `workflow_dispatch`, allowing 
developers to run CI jobs on-demand.
+
+### How to Manually Trigger a Workflow
+
+1. Navigate to the **Actions** tab in GitHub
+2. Select the workflow from the left sidebar (e.g., "Build and Test 
Cloudberry")
+3. Click **Run workflow** button (top right)
+4. Select your branch
+5. Configure input parameters (if available)
+6. Click **Run workflow**
+
+### Workflow Input Parameters
+
+#### `build-cloudberry.yml` - Main CI
+
+| Parameter | Description | Default | Example |
+|-----------|-------------|---------|---------|
+| `test_selection` | Comma-separated list of tests to run, or "all" | `all` | 
`ic-good-opt-off,ic-contrib` |
+| `reuse_artifacts_from_run_id` | Run ID to reuse build artifacts from (see 
below) | _(empty)_ | `12345678901` |
+
+**Available test selections:**
+- `all` - Run all test suites
+- `ic-good-opt-off` - Installcheck with optimizer off
+- `ic-good-opt-on` - Installcheck with optimizer on
+- `ic-contrib` - Contrib extension tests
+- `ic-resgroup` - Resource group tests
+- `ic-resgroup-v2` - Resource group v2 tests
+- `ic-resgroup-v2-memory-accounting` - Resource group memory tests
+- `ic-singlenode` - Single-node mode tests
+- `make-installcheck-world` - Full test suite
+- And more... (see workflow for complete list)
+
+## Artifact Reuse for Faster Testing
+
+When debugging test failures, rebuilding Cloudberry (~50-70 minutes) on every 
iteration is inefficient. The artifact reuse feature allows you to reuse build 
artifacts from a previous successful run.
+
+### How It Works
+
+1. Build artifacts (RPMs, source tarballs) from a previous workflow run are 
downloaded
+2. Build job is skipped (saves ~45-60 minutes)
+3. RPM installation test is skipped (saves ~5-10 minutes)
+4. Test jobs run with the reused artifacts
+5. You can iterate on test configurations without rebuilding
+
+### Step-by-Step Guide
+
+#### 1. Find the Run ID
+
+After a successful build (even if tests failed), get the run ID:
+
+**Option A: From GitHub Actions UI**
+- Go to **Actions** tab → Click on a completed workflow run
+- The URL will be: 
`https://github.com/apache/cloudberry/actions/runs/12345678901`
+- The run ID is `12345678901`
+
+**Option B: From GitHub API**
+```bash
+# List recent workflow runs
+gh run list --workflow=build-cloudberry.yml --limit 5
+
+# Get run ID from specific branch
+gh run list --workflow=build-cloudberry.yml --branch=my-feature --limit 1
+```
+
+#### 2. Trigger New Run with Artifact Reuse
+
+**Via GitHub UI:**
+1. Go to **Actions** → **Build and Test Cloudberry**
+2. Click **Run workflow**
+3. Enter the run ID in **"Reuse build artifacts from a previous run ID"**
+4. Optionally customize **test_selection**
+5. Click **Run workflow**
+
+**Via GitHub CLI:**
+```bash
+# Reuse artifacts from run 12345678901, run only specific tests
+gh workflow run build-cloudberry.yml \
+  --field reuse_artifacts_from_run_id=12345678901 \
+  --field test_selection=ic-good-opt-off
+```
+
+#### 3. Monitor Test Execution
+
+- Build job will be skipped (shows as "Skipped" in Actions UI)
+- RPM Install Test will be skipped
+- Test jobs will run with artifacts from the specified run ID
+- Total time: ~15-30 minutes (vs ~65-100 minutes for full build+test)
+
+### Use Cases
+
+**Debugging a specific test failure:**
+```bash
+# Run 1: Full build + all tests (finds test failure in ic-good-opt-off)
+gh workflow run build-cloudberry.yml
+
+# Get the run ID from output
+RUN_ID=$(gh run list --workflow=build-cloudberry.yml --limit 1 --json 
databaseId --jq '.[0].databaseId')
+
+# Run 2: Reuse artifacts, run only failing test
+gh workflow run build-cloudberry.yml \
+  --field reuse_artifacts_from_run_id=$RUN_ID \
+  --field test_selection=ic-good-opt-off
+```
+
+**Testing different configurations:**
+```bash
+# Test with optimizer off, then on, using same build
+gh workflow run build-cloudberry.yml \
+  --field reuse_artifacts_from_run_id=$RUN_ID \
+  --field test_selection=ic-good-opt-off
+
+gh workflow run build-cloudberry.yml \
+  --field reuse_artifacts_from_run_id=$RUN_ID \
+  --field test_selection=ic-good-opt-on
+```
+
+### Limitations
+
+- Artifacts expire after 90 days (GitHub default retention)
+- Run ID must be from the same repository (or accessible fork)
+- Artifacts must include both RPM and source build artifacts
+- Cannot reuse artifacts across different OS/architecture combinations
+- Changes to source code require a fresh build
+
+## Running Workflows in Forked Repositories
+
+GitHub Actions workflows are enabled in forks, allowing you to validate 
changes before submitting a Pull Request.
+
+### Initial Setup (One-Time)
+
+1. **Fork the repository** to your GitHub account
+
+2. **Enable GitHub Actions** in your fork:
+   - Go to your fork's **Actions** tab
+   - Click **"I understand my workflows, go ahead and enable them"**
+
+**Secrets Configuration:**
+
+No manual secret configuration is required for the main build and test 
workflows.
+
+- `GITHUB_TOKEN` is automatically provided by GitHub and used when downloading 
artifacts from previous runs (artifact reuse feature)
+- DockerHub secrets (`DOCKERHUB_USER`, `DOCKERHUB_TOKEN`) are only required 
for building custom container images (advanced/maintainer use case, not needed 
for typical development)
+
+### Workflow Behavior in Forks
+
+- ✅ **Automated triggers work**: Push and PR events trigger workflows
+- ✅ **Manual triggers work**: `workflow_dispatch` is fully functional
+- ✅ **Artifact reuse works**: Can reuse artifacts from previous runs in your 
fork
+- ⚠️ **Cross-fork artifact reuse**: Not supported (security restriction)
+- ⚠️ **Some features may be limited**: Certain features requiring 
organization-level secrets may not work
+
+### Best Practices for Fork Development
+
+1. **Test locally first** when possible (faster iteration)
+2. **Use manual triggers** to avoid burning GitHub Actions minutes 
unnecessarily
+3. **Use artifact reuse** to iterate on test failures efficiently
+4. **Push to feature branches** to trigger automated CI
+5. **Review Actions tab** to ensure workflows completed successfully before 
opening PR
+
+### Example Fork Workflow
+
+```bash
+# 1. Create feature branch in fork
+git checkout -b fix-test-failure
+
+# 2. Make changes and push to fork
+git commit -am "Fix test failure"
+git push origin fix-test-failure
+
+# 3. CI runs automatically on push
+
+# 4. If tests fail, iterate using artifact reuse
+# Get run ID from your fork's Actions tab
+gh workflow run build-cloudberry.yml \
+  --field reuse_artifacts_from_run_id=12345678901 \
+  --field test_selection=ic-good-opt-off
+
+# 5. Once tests pass, open PR to upstream
+gh pr create --web
+```
+
+## Troubleshooting
+
+### "Build job was skipped but tests failed to start"
+
+**Cause:** Artifacts from specified run ID not found or expired
+
+**Solution:**
+- Verify the run ID is correct
+- Check that run completed successfully (built artifacts)
+- Run a fresh build if artifacts expired (>90 days)
+
+### "Workflow not found in fork"
+
+**Cause:** GitHub Actions not enabled in fork
+
+**Solution:**
+- Go to fork's **Actions** tab
+- Click to enable workflows
+
+### "Resource not accessible by integration"
+
+**Cause:** Workflow trying to access artifacts from different repository
+
+**Solution:**
+- Can only reuse artifacts from same repository
+- Run a fresh build in your fork first, then reuse those artifacts
+
+## Additional Resources
+
+- [GitHub Actions Documentation](https://docs.github.com/en/actions)
+- [Cloudberry Contributing Guide](../../CONTRIBUTING.md)
+- [Cloudberry Build Guide](../../deploy/build/README.md)
+- [DevOps Scripts](../../devops/README.md)
diff --git a/.github/workflows/build-cloudberry.yml 
b/.github/workflows/build-cloudberry.yml
index fecd44a9637..04d5e827b6e 100644
--- a/.github/workflows/build-cloudberry.yml
+++ b/.github/workflows/build-cloudberry.yml
@@ -113,6 +113,11 @@ on:
         required: false
         default: 'all'
         type: string
+      reuse_artifacts_from_run_id:
+        description: 'Reuse build artifacts from a previous run ID (leave 
empty to build fresh)'
+        required: false
+        default: ''
+        type: string
 
 concurrency:
   group: ${{ github.workflow }}-${{ github.ref }}
@@ -412,6 +417,7 @@ jobs:
     needs: [check-skip]
     runs-on: ubuntu-22.04
     timeout-minutes: 120
+    if: github.event.inputs.reuse_artifacts_from_run_id == ''
     outputs:
       build_timestamp: ${{ steps.set_timestamp.outputs.timestamp }}
 
@@ -687,6 +693,10 @@ jobs:
   rpm-install-test:
     name: RPM Install Test Apache Cloudberry
     needs: [check-skip, build]
+    if: |
+      !cancelled() &&
+      (needs.build.result == 'success' || needs.build.result == 'skipped') &&
+      github.event.inputs.reuse_artifacts_from_run_id == ''
     runs-on: ubuntu-22.04
     timeout-minutes: 120
 
@@ -710,6 +720,8 @@ jobs:
           name: apache-cloudberry-db-incubating-rpm-build-artifacts
           path: ${{ github.workspace }}/rpm_build_artifacts
           merge-multiple: false
+          run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id || 
github.run_id }}
+          github-token: ${{ secrets.GITHUB_TOKEN }}
 
       - name: Cloudberry Environment Initialization
         if: needs.check-skip.outputs.should_skip != 'true'
@@ -814,12 +826,18 @@ jobs:
             echo "Version: ${RPM_VERSION}"
             echo "Release: ${RPM_RELEASE}"
 
+            # Refresh repository metadata to avoid mirror issues
+            echo "Refreshing repository metadata..."
+            dnf clean all
+            dnf makecache --refresh || dnf makecache
+
             # Clean install location
             rm -rf /usr/local/cloudberry-db
 
-            # Install RPM
+            # Install RPM with retry logic for mirror issues
+            # Use --releasever=9 to pin to stable Rocky Linux 9 repos (not 
bleeding-edge 9.6)
             echo "Starting installation..."
-            if ! time dnf install -y "${RPM_FILE}"; then
+            if ! time dnf install -y --setopt=retries=10 --releasever=9 
"${RPM_FILE}"; then
               echo "::error::RPM installation failed"
               exit 1
             fi
@@ -858,6 +876,9 @@ jobs:
   test:
     name: ${{ matrix.test }}
     needs: [check-skip, build, prepare-test-matrix]
+    if: |
+      !cancelled() &&
+      (needs.build.result == 'success' || needs.build.result == 'skipped')
     runs-on: ubuntu-22.04
     timeout-minutes: 120
     # actionlint-allow matrix[*].pg_settings
@@ -1087,6 +1108,8 @@ jobs:
           name: apache-cloudberry-db-incubating-rpm-build-artifacts
           path: ${{ github.workspace }}/rpm_build_artifacts
           merge-multiple: false
+          run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id || 
github.run_id }}
+          github-token: ${{ secrets.GITHUB_TOKEN }}
 
       - name: Download Cloudberry Source build artifacts
         if: needs.check-skip.outputs.should_skip != 'true'
@@ -1095,6 +1118,8 @@ jobs:
           name: apache-cloudberry-db-incubating-source-build-artifacts
           path: ${{ github.workspace }}/source_build_artifacts
           merge-multiple: false
+          run-id: ${{ github.event.inputs.reuse_artifacts_from_run_id || 
github.run_id }}
+          github-token: ${{ secrets.GITHUB_TOKEN }}
 
       - name: Verify downloaded artifacts
         if: needs.check-skip.outputs.should_skip != 'true'
@@ -1186,12 +1211,18 @@ jobs:
             echo "Version: ${RPM_VERSION}"
             echo "Release: ${RPM_RELEASE}"
 
+            # Refresh repository metadata to avoid mirror issues
+            echo "Refreshing repository metadata..."
+            dnf clean all
+            dnf makecache --refresh || dnf makecache
+
             # Clean install location
             rm -rf /usr/local/cloudberry-db
 
-            # Install RPM
+            # Install RPM with retry logic for mirror issues
+            # Use --releasever=9 to pin to stable Rocky Linux 9 repos (not 
bleeding-edge 9.6)
             echo "Starting installation..."
-            if ! time dnf install -y "${RPM_FILE}"; then
+            if ! time dnf install -y --setopt=retries=10 --releasever=9 
"${RPM_FILE}"; then
               echo "::error::RPM installation failed"
               exit 1
             fi
diff --git a/src/test/regress/greenplum_schedule 
b/src/test/regress/greenplum_schedule
index ecf37e73029..039e8d7e9c4 100755
--- a/src/test/regress/greenplum_schedule
+++ b/src/test/regress/greenplum_schedule
@@ -15,6 +15,16 @@
 #   hitting max_connections limit on segments.
 #
 
+# Run disk-intensive tests early when maximum disk space is available.
+# These tests consume significant disk space through WAL generation, XID 
consumption,
+# and autovacuum operations. Running them early helps avoid disk exhaustion 
issues.
+test: autovacuum
+test: autovacuum-segment
+test: autovacuum-template0-segment
+
+# check profile feature
+test: profile
+
 # test for builtin namespace pg_ext_aux
 test: pg_ext_aux
 
@@ -321,9 +331,6 @@ test: oid_wraparound
 # hence it should be run in isolation.
 test: fts_recovery_in_progress
 ignore: mirror_replay
-test: autovacuum
-test: autovacuum-segment
-test: autovacuum-template0-segment
 
 # gpexpand introduce the partial tables, check them if they can run correctly
 test: gangsize gang_reuse
@@ -334,9 +341,6 @@ test: run_utility_gpexpand_phase1
 # check correct error message when create extension error on segment
 test: create_extension_fail
 
-# check profile feature
-test: profile
-
 # check offload entry root slice to QE feature
 test: offload_entry_to_qe
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(cloudberry) branch main updated: Improve CI reliability and developer productivity through test scheduling optimizations, mirror stability fixes, and a new artifact reuse feature. (#1379)

Reply via email to