(spark) branch master updated: [SPARK-56721][INFRA] Add master server vs 4.0 client CI for all PRs

gaogaotiantian Wed, 03 Jun 2026 15:14:31 -0700

This is an automated email from the ASF dual-hosted git repository.

gaogaotiantian pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 7a664859526d [SPARK-56721][INFRA] Add master server vs 4.0 client CI 
for all PRs
7a664859526d is described below

commit 7a664859526d1a59a848cbb1f1d4dcc571b3b49f
Author: Tian Gao <[email protected]>
AuthorDate: Wed Jun 3 15:13:16 2026 -0700

    [SPARK-56721][INFRA] Add master server vs 4.0 client CI for all PRs
    
    ### What changes were proposed in this pull request?
    
    Add a task to test master server vs 4.0 client for backward compatibility.
    
    ### Why are the changes needed?
    
    We've broken old client for many times. It's a scheduled task and the 
problem is often found a few days later. We should catch the issue asap.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    CI should give result about whether it worked.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Part of it is written by Claude Code (Opus 4.7).
    
    Closes #55677 from gaogaotiantian/add-old-client-check.
    
    Authored-by: Tian Gao <[email protected]>
    Signed-off-by: Tian Gao <[email protected]>
---
 .github/workflows/build_and_test.yml | 47 ++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 66c3f89ebbab..860ef27447f9 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -105,6 +105,7 @@ jobs:
           pyspark=`./dev/is-changed.py -m $pyspark_modules`
           pandas=`./dev/is-changed.py -m $pyspark_pandas_modules`
           pyspark_install=`./dev/is-changed.py -m pyspark-install`
+          pyspark_connect_old_client="$pyspark"
           if [[ "${{ github.repository }}" != 'apache/spark' ]]; then
             yarn=`./dev/is-changed.py -m yarn`
             kubernetes=`./dev/is-changed.py -m kubernetes`
@@ -139,6 +140,7 @@ jobs:
             java25=true
           else
             pyspark_install=false
+            pyspark_connect_old_client=false
             pandas=false
             yarn=false
             kubernetes=false
@@ -160,6 +162,7 @@ jobs:
               \"pyspark\": \"$pyspark\",
               \"pyspark-pandas\": \"$pandas\",
               \"pyspark-install\": \"$pyspark_install\",
+              \"pyspark-connect-old-client\": \"$pyspark_connect_old_client\",
               \"sparkr\": \"$sparkr\",
               \"tpcds-1g\": \"$tpcds\",
               \"docker-integration-tests\": \"$docker\",
@@ -670,6 +673,8 @@ jobs:
             pyspark-streaming, pyspark-structured-streaming, 
pyspark-structured-streaming-connect
           - >-
             pyspark-connect
+          - >-
+            pyspark-connect-old-client
           - >-
             pyspark-install
           - >-
@@ -688,6 +693,7 @@ jobs:
           - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark 
!= 'true' && 'pyspark-mllib, pyspark-ml, pyspark-ml-connect' }}
           - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark 
!= 'true' && 'pyspark-streaming, pyspark-structured-streaming, 
pyspark-structured-streaming-connect' }}
           - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark 
!= 'true' && 'pyspark-connect' }}
+          - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-connect-old-client != 
'true' &&  'pyspark-connect-old-client'}}
           # pyspark-install is very slow so we only run it when it's changed 
or explicity requested
           - modules: ${{ 
fromJson(needs.precondition.outputs.required).pyspark-install != 'true' && 
'pyspark-install' }}
           # Always run if pyspark-pandas == 'true', even infra-image is skip 
(such as non-master job)
@@ -782,6 +788,7 @@ jobs:
     # Run the tests.
     - name: Run tests
       env: ${{ fromJSON(inputs.envs) }}
+      if: ${{ matrix.modules != 'pyspark-connect-old-client' }}
       shell: 'script -q -e -c "bash {0}"'
       run: |
         if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
@@ -798,6 +805,46 @@ jobs:
           # For branch-3.5 and below, it uses the default Python versions.
           ./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST"
         fi
+    - name: Run tests for old client
+      env:
+        SPARK_TESTING: 1
+        SPARK_SKIP_CONNECT_COMPAT_TESTS: 1
+        SPARK_CONNECT_TESTING_REMOTE: sc://localhost
+      if: ${{ matrix.modules == 'pyspark-connect-old-client' && inputs.branch 
== 'master' }}
+      run: |
+        # Build Spark
+        if [ "${{ steps.extract-precompiled.outcome }}" = "success" ]; then
+          echo "Reusing precompiled artifact, skipping local SBT build."
+        else
+          ./build/sbt -Phive Test/package
+        fi
+
+        # Make less noisy
+        cp conf/log4j2.properties.template conf/log4j2.properties
+        sed -i 's/rootLogger.level = info/rootLogger.level = warn/g' 
conf/log4j2.properties
+
+        # Start a Spark Connect server for local
+        
PYTHONPATH="python/lib/pyspark.zip:python/lib/py4j-0.10.9.9-src.zip:$PYTHONPATH"
 ./sbin/start-connect-server.sh \
+          --driver-java-options 
"-Dlog4j.configurationFile=file:$GITHUB_WORKSPACE/conf/log4j2.properties" \
+          --jars "`find connector/protobuf/target -name 
spark-protobuf-*SNAPSHOT.jar`,`find connector/avro/target -name 
spark-avro*SNAPSHOT.jar`" \
+          --conf 
spark.sql.execution.arrow.pyspark.validateSchema.enabled=false \
+          --conf spark.sql.execution.pandas.convertToArrowArraySafely=false
+
+        # Checkout to branch-4.0 to use the tests in branch-4.0.
+        cd ..
+        git clone --single-branch --branch branch-4.0 
$GITHUB_SERVER_URL/$GITHUB_REPOSITORY spark-4.0
+        cd spark-4.0
+        # Merge in apache/spark's branch-4.0 so CI runs against the latest 
upstream tests,
+        # while still incorporating any changes the contributor made on their 
fork's branch-4.0.
+        git fetch https://github.com/apache/spark.git branch-4.0
+        git -c user.name='Apache Spark Test Account' -c 
user.email='[email protected]' \
+            merge --no-edit FETCH_HEAD
+
+        # Several tests related to catalog requires to run them sequencially, 
e.g., writing a table in a listener.
+        # Run branch-4.0 tests
+        ./python/run-tests --parallelism=1 --python-executables=python3 
--modules pyspark-connect
+        # None of tests are dependent on each other in Pandas API on Spark so 
run them in parallel
+        ./python/run-tests --parallelism=1 --python-executables=python3 
--modules pyspark-pandas-connect,pyspark-pandas-slow-connect
     - name: Upload coverage to Codecov
       if: fromJSON(inputs.envs).PYSPARK_CODECOV == 'true'
       uses: codecov/codecov-action@75cd11691c0faa626561e295848008c8a7dddffe # 
v5


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-56721][INFRA] Add master server vs 4.0 client CI for all PRs

Reply via email to