This is an automated email from the ASF dual-hosted git repository.

zhengruifeng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 4dbc9d7d2ec3 [SPARK-57144][INFRA] Unify Coursier cache to a single key 
across all jobs
4dbc9d7d2ec3 is described below

commit 4dbc9d7d2ec38b5fc09fbe8e02ec04a133bffb50
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Jun 4 21:47:44 2026 +0800

    [SPARK-57144][INFRA] Unify Coursier cache to a single key across all jobs
    
    ### What changes were proposed in this pull request?
    
    Replace 8 distinct per-job Coursier cache keys with a single 
`coursier-<hash>` key in `.github/workflows/build_and_test.yml` and 
`python_hosted_runner_test.yml`:
    
    - **`precompile`** and **`build`** (Scala test matrix): `actions/cachev5` — 
both can write `coursier-<hash>`. `precompile` is the primary writer (runs 
first, full dependency superset via all `-P` profiles). `build` is the fallback 
writer — when `precompile` is absent or its save fails, the first `build` 
matrix entry seeds the cache. When `precompile` did save it, `build` gets an 
exact key hit and GHA automatically skips the post-save (caches are immutable).
    - **All other consumers** (`pyspark` ×9, `sparkr`, `lint`, `docs`, 
`tpcds-1g`, `docker-integration-tests`, `k8s-integration-tests`): converted to 
`actions/cache/restorev5` — restore-only, never write. `tpcds-1g` in particular 
only fires when SQL code changes and is skipped on the vast majority of runs, 
so its own Coursier cache entry would typically be LRU-evicted before the next 
run anyway.
    
    ### Why are the changes needed?
    
    **1. Same-commit duplicates — ~0.01% apart by bytes.**
    
    Per-job keys let every consumer job re-save its own copy of effectively the 
same content. Measured on master:
    
    ```
    precompile-coursier-4f7e6f95   1,469,711,354 B   current superset
    25-hadoop3-coursier-4f7e6f95   1,469,562,712 B   145 KB different  ← 0.01% 
apart
    precompile-coursier-03ca361a   1,624,890,072 B   previous-hash superset 
(~90% same)
    ────────────────────────────────────────────────────────────────────
    total:                        ~4.56 GB            distinct content: ~1.47 GB
    ```
    
    The 145 KB delta exists because Coursier doesn't prune: on a cold run the 
test-matrix job restores the precompile superset via restore-key, runs tests 
(which resolve nothing beyond it), and its post-step re-saves a byte-for-byte 
copy under its own key. The per-module keys are not holding different 
dependency sets — they are holding copies of the same superset.
    
    **2. Repo-wide 10 GB budget consumed by duplicates.**
    
    Duplicates from just two branches left no room for any other branch:
    
    ```
    branch-4.x:  tpcds-coursier               1895 MB
                 21-hadoop3-coursier           1437 MB
                 docker-integration-coursier   1437 MB   → 4770 MB
    
    master:      precompile-coursier (hash A)  1549 MB
                 precompile-coursier (hash B)  1401 MB
                 25-hadoop3-coursier (hash B)  1401 MB   → 4351 MB
    
    total: ~9.1 GB used, 10 GB budget
    ```
    
    Old maintenance branches (branch-4.0, 4.1, 4.2, 3.5) had their caches 
evicted before their next scheduled CI run and were always cold.
    
    **3. Dep-upgrade burst amplifies the problem.**
    
    `pom.xml`/`plugins.sbt` are touched ~5–6 times per month on average, but 
upgrades cluster: on 2026-05-28 alone, 5 dependency upgrades merged in a single 
day (rocksdbjni, joda-time, gson, Jetty, zstd-jni). Each commit rolls the hash, 
so 5 consecutive CI runs each start with a cold Coursier cache. Under the old 
design each cold run raced to create ~5 new ~1.4 GB entries (~7 GB), 
immediately overflowing the budget and evicting the previous run's still-warm 
caches. Under the new design ea [...]
    
    **Summary:** with one writer per branch the per-branch footprint drops from 
~4.5 GB to ~1.4 GB, fitting ~6 branches in the 10 GB budget simultaneously, and 
a burst of dep-upgrade commits no longer triggers a cascade of mutual evictions.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. CI-only.
    
    ### How was this patch tested?
    
    YAML validates with `python3 -c "import yaml; yaml.safe_load(...)"`.
    
    The correctness of the one-writer design relies on two GHA cache guarantees 
verified in prior CI runs:
    1. Caches are immutable — an exact key hit skips the post-save step (`Cache 
hit occurred on the primary key …, not saving cache`), so multiple jobs using 
`actions/cachev5` with the same key don't produce duplicates when the cache 
already exists.
    2. The `precompile` job builds with every profile (`-Phadoop-3 -Pyarn 
-Pspark-ganglia-lgpl -Phadoop-cloud -Phive -Pkubernetes -Pjvm-profiler 
-Pkinesis-asl -Phive-thriftserver -Pdocker-integration-tests -Pvolcano`), so 
its `~/.cache/coursier` is a superset of every consumer job's closure.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (claude-sonnet-4-6)
    
    Closes #56201 from zhengruifeng/unify-coursier-ci-cache-opt-dev6.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_and_test.yml            | 77 +++++++++++--------------
 .github/workflows/python_hosted_runner_test.yml |  4 +-
 2 files changed, 37 insertions(+), 44 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index cda9636a92e4..b36655c390f0 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -381,15 +381,14 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: ${{ matrix.java }}-${{ matrix.hadoop }}-coursier-${{ 
hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        key: ${{ runner.os }}-${{ matrix.java }}-${{ matrix.hadoop 
}}-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
         restore-keys: |
-          ${{ matrix.java }}-${{ matrix.hadoop }}-coursier-
-          precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
-          precompile-coursier-
+          ${{ runner.os }}-${{ matrix.java }}-${{ matrix.hadoop }}-coursier-
+          ${{ runner.os }}-coursier-
     - name: Free up disk space
       run: |
         if [ -f ./dev/free_disk_space ]; then
@@ -620,9 +619,9 @@ jobs:
       uses: actions/cache@v5
       with:
         path: ~/.cache/coursier
-        key: precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') 
}}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          precompile-coursier-
+          ${{ runner.os }}-coursier-
     - name: Install Java ${{ inputs.java }}
       uses: actions/setup-java@v5
       with:
@@ -745,13 +744,13 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: pyspark-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          pyspark-coursier-
+          ${{ runner.os }}-coursier-
     - name: Free up disk space
       shell: 'script -q -e -c "bash {0}"'
       run: ./dev/free_disk_space_container
@@ -936,13 +935,13 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: sparkr-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          sparkr-coursier-
+          ${{ runner.os }}-coursier-
     - name: Free up disk space
       run: ./dev/free_disk_space_container
     - name: Install Java ${{ inputs.java }}
@@ -1087,13 +1086,13 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: docs-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          docs-coursier-
+          ${{ runner.os }}-coursier-
     - name: Cache Maven local repository
       uses: actions/cache@v5
       with:
@@ -1286,13 +1285,13 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: docs-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          docs-coursier-
+          ${{ runner.os }}-coursier-
     - name: Cache Maven local repository
       uses: actions/cache@v5
       with:
@@ -1481,15 +1480,13 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: tpcds-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          tpcds-coursier-
-          precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
-          precompile-coursier-
+          ${{ runner.os }}-coursier-
     - name: Install Java ${{ inputs.java }}
       uses: actions/setup-java@v5
       with:
@@ -1615,15 +1612,13 @@ jobs:
         key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
         restore-keys: |
           build-
-    - name: Cache Coursier local repository
-      uses: actions/cache@v5
+    - name: Restore Coursier local repository
+      uses: actions/cache/restore@v5
       with:
         path: ~/.cache/coursier
-        key: docker-integration-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
+        key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
         restore-keys: |
-          docker-integration-coursier-
-          precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
-          precompile-coursier-
+          ${{ runner.os }}-coursier-
     - name: Install Java ${{ inputs.java }}
       uses: actions/setup-java@v5
       with:
@@ -1703,15 +1698,13 @@ jobs:
           key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 
'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 
'build/spark-build-info') }}
           restore-keys: |
             build-
-      - name: Cache Coursier local repository
-        uses: actions/cache@v5
+      - name: Restore Coursier local repository
+        uses: actions/cache/restore@v5
         with:
           path: ~/.cache/coursier
-          key: k8s-integration-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
+          key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
           restore-keys: |
-            k8s-integration-coursier-
-            precompile-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') 
}}
-            precompile-coursier-
+            ${{ runner.os }}-coursier-
       - name: Free up disk space
         run: |
           if [ -f ./dev/free_disk_space ]; then
diff --git a/.github/workflows/python_hosted_runner_test.yml 
b/.github/workflows/python_hosted_runner_test.yml
index eb0430bfe6c2..e29b89708bea 100644
--- a/.github/workflows/python_hosted_runner_test.yml
+++ b/.github/workflows/python_hosted_runner_test.yml
@@ -137,9 +137,9 @@ jobs:
         uses: actions/cache@v5
         with:
           path: ~/.cache/coursier
-          key: pyspark-coursier-${{ hashFiles('**/pom.xml', '**/plugins.sbt') 
}}
+          key: ${{ runner.os }}-coursier-${{ hashFiles('**/pom.xml', 
'**/plugins.sbt') }}
           restore-keys: |
-            pyspark-coursier-
+            ${{ runner.os }}-coursier-
       - name: Install Java ${{ matrix.java }}
         uses: actions/setup-java@v5
         with:


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to