This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-3.5 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push: new ff691fa611f0 [SPARK-48116][INFRA][3.5] Run `pyspark-pandas*` only in PR builder and Daily Python CIs ff691fa611f0 is described below commit ff691fa611f0c8a7f0ff626179bced2b48ef9b7d Author: Dongjoon Hyun <dh...@apple.com> AuthorDate: Wed May 8 13:45:55 2024 -0700 [SPARK-48116][INFRA][3.5] Run `pyspark-pandas*` only in PR builder and Daily Python CIs ### What changes were proposed in this pull request? This PR aims to run `pyspark-pandas*` of `branch-3.5` only in PR builder and Daily Python CIs. In other words, only the commit builder will skip it by default. Please note that all PR builders is not consuming ASF resources and they provides lots of test coverage everyday. `branch-3.5` Python Daily CI runs all Python tests including `pyspark-pandas` like the following. https://github.com/apache/spark/blob/21548a8cc5c527d4416a276a852f967b4410bd4b/.github/workflows/build_branch35_python.yml#L43-L44 ### Why are the changes needed? To reduce GitHub Action usage to meet ASF INFRA policy. - https://infra.apache.org/github-actions-policy.html > All workflows MUST have a job concurrency level less than or equal to 20. This means a workflow cannot have more than 20 jobs running at the same time across all matrices. Although `pandas` is an **optional** package in PySpark, this is essential for PySpark users and we have **6 test pipelines** which requires lots of resources. We need to optimize the job concurrently level to `less than or equal to 20` while keeping the test capability as much as possible. https://github.com/apache/spark/blob/a762f3175fcdb7b069faa0c2bfce93d295cb1f10/dev/requirements.txt#L4-L7 - pyspark-pandas - pyspark-pandas-slow - pyspark-pandas-connect - pyspark-pandas-slow-connect ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46482 from dongjoon-hyun/SPARK-48116-3.5. Authored-by: Dongjoon Hyun <dh...@apple.com> Signed-off-by: Dongjoon Hyun <dh...@apple.com> --- .github/workflows/build_and_test.yml | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml index 9c3dc95d0f66..679c51bb0941 100644 --- a/.github/workflows/build_and_test.yml +++ b/.github/workflows/build_and_test.yml @@ -82,6 +82,11 @@ jobs: pyspark=true; sparkr=true; tpcds=true; docker=true; pyspark_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"` pyspark=`./dev/is-changed.py -m $pyspark_modules` + if [ "${{ github.repository != 'apache/spark' }}" ]; then + pandas=$pyspark + else + pandas=false + fi sparkr=`./dev/is-changed.py -m sparkr` tpcds=`./dev/is-changed.py -m sql` docker=`./dev/is-changed.py -m docker-integration-tests` @@ -90,6 +95,7 @@ jobs: { \"build\": \"$build\", \"pyspark\": \"$pyspark\", + \"pyspark-pandas\": \"$pandas\", \"sparkr\": \"$sparkr\", \"tpcds-1g\": \"$tpcds\", \"docker-integration-tests\": \"$docker\", @@ -361,6 +367,14 @@ jobs: pyspark-pandas-connect - >- pyspark-pandas-slow-connect + exclude: + # Always run if pyspark-pandas == 'true', even infra-image is skip (such as non-master job) + # In practice, the build will run in individual PR, but not against the individual commit + # in Apache Spark repository. + - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 'pyspark-pandas' }} + - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 'pyspark-pandas-slow' }} + - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 'pyspark-pandas-connect' }} + - modules: ${{ fromJson(needs.precondition.outputs.required).pyspark-pandas != 'true' && 'pyspark-pandas-slow-connect' }} env: MODULES_TO_TEST: ${{ matrix.modules }} HADOOP_PROFILE: ${{ inputs.hadoop }} --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org