(spark) branch master updated: [SPARK-50712][INFRA][PS][TESTS] Add a daily build for Pandas API on Spark with old dependencies

ruifengz Thu, 02 Jan 2025 00:46:11 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new c374c00432bd [SPARK-50712][INFRA][PS][TESTS] Add a daily build for 
Pandas API on Spark with old dependencies
c374c00432bd is described below

commit c374c00432bd28b98b7623481b7985aa7bc7624a
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Jan 2 16:44:21 2025 +0800

    [SPARK-50712][INFRA][PS][TESTS] Add a daily build for Pandas API on Spark 
with old dependencies
    
    ### What changes were proposed in this pull request?
    Add a daily build for Pandas API on Spark with old dependencies
    
    ### Why are the changes needed?
    The PS part requires a newer version of Pandas
    
    ### Does this PR introduce _any_ user-facing change?
    No, infra-only
    
    ### How was this patch tested?
    PR builder with
    ```
    default: '{"PYSPARK_IMAGE_TO_TEST": "python-ps-minimum", "PYTHON_TO_TEST": 
"python3.9"}'
    
    default: '{"pyspark": "true", "pyspark-pandas": "true"}'
    ```
    
    https://github.com/zhengruifeng/spark/runs/35054863846
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Closes #49343 from zhengruifeng/infra_ps_mini.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .github/workflows/build_infra_images_cache.yml     | 13 ++++
 .github/workflows/build_python_ps_minimum.yml      | 47 +++++++++++++
 dev/spark-test-image/python-ps-minimum/Dockerfile  | 81 ++++++++++++++++++++++
 python/pyspark/pandas/tests/io/test_io.py          |  8 ++-
 .../pandas/tests/io/test_series_conversion.py      |  2 +
 5 files changed, 150 insertions(+), 1 deletion(-)

diff --git a/.github/workflows/build_infra_images_cache.yml 
b/.github/workflows/build_infra_images_cache.yml
index 565bb8c7d6e6..ac139147beb9 100644
--- a/.github/workflows/build_infra_images_cache.yml
+++ b/.github/workflows/build_infra_images_cache.yml
@@ -122,6 +122,19 @@ jobs:
       - name: Image digest (PySpark with old dependencies)
         if: hashFiles('dev/spark-test-image/python-minimum/Dockerfile') != ''
         run: echo ${{ steps.docker_build_pyspark_python_minimum.outputs.digest 
}}
+      - name: Build and push (PySpark PS with old dependencies)
+        if: hashFiles('dev/spark-test-image/python-ps-minimum/Dockerfile') != 
''
+        id: docker_build_pyspark_python_ps_minimum
+        uses: docker/build-push-action@v6
+        with:
+          context: ./dev/spark-test-image/python-ps-minimum/
+          push: true
+          tags: 
ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
 github.ref_name }}-static
+          cache-from: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
 github.ref_name }}
+          cache-to: 
type=registry,ref=ghcr.io/apache/spark/apache-spark-github-action-image-pyspark-python-ps-minimum-cache:${{
 github.ref_name }},mode=max
+      - name: Image digest (PySpark PS with old dependencies)
+        if: hashFiles('dev/spark-test-image/python-ps-minimum/Dockerfile') != 
''
+        run: echo ${{ 
steps.docker_build_pyspark_python_ps_minimum.outputs.digest }}
       - name: Build and push (PySpark with PyPy 3.10)
         if: hashFiles('dev/spark-test-image/pypy-310/Dockerfile') != ''
         id: docker_build_pyspark_pypy_310
diff --git a/.github/workflows/build_python_ps_minimum.yml 
b/.github/workflows/build_python_ps_minimum.yml
new file mode 100644
index 000000000000..742d578e2741
--- /dev/null
+++ b/.github/workflows/build_python_ps_minimum.yml
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+name: "Build / Python-only (master, Python PS with old dependencies)"
+
+on:
+  schedule:
+    - cron: '0 10 * * *'
+  workflow_dispatch:
+
+jobs:
+  run-build:
+    permissions:
+      packages: write
+    name: Run
+    uses: ./.github/workflows/build_and_test.yml
+    if: github.repository == 'apache/spark'
+    with:
+      java: 17
+      branch: master
+      hadoop: hadoop3
+      envs: >-
+        {
+          "PYSPARK_IMAGE_TO_TEST": "python-ps-minimum",
+          "PYTHON_TO_TEST": "python3.9"
+        }
+      jobs: >-
+        {
+          "pyspark": "true",
+          "pyspark-pandas": "true"
+        }
diff --git a/dev/spark-test-image/python-ps-minimum/Dockerfile 
b/dev/spark-test-image/python-ps-minimum/Dockerfile
new file mode 100644
index 000000000000..913da06c551c
--- /dev/null
+++ b/dev/spark-test-image/python-ps-minimum/Dockerfile
@@ -0,0 +1,81 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Image for building and testing Spark branches. Based on Ubuntu 22.04.
+# See also in https://hub.docker.com/_/ubuntu
+FROM ubuntu:jammy-20240911.1
+LABEL org.opencontainers.image.authors="Apache Spark project 
<[email protected]>"
+LABEL org.opencontainers.image.licenses="Apache-2.0"
+LABEL org.opencontainers.image.ref.name="Apache Spark Infra Image For Pandas 
API on Spark with old dependencies"
+# Overwrite this label to avoid exposing the underlying Ubuntu OS version label
+LABEL org.opencontainers.image.version=""
+
+ENV FULL_REFRESH_DATE=20250102
+
+ENV DEBIAN_FRONTEND=noninteractive
+ENV DEBCONF_NONINTERACTIVE_SEEN=true
+
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    ca-certificates \
+    curl \
+    gfortran \
+    git \
+    gnupg \
+    libcurl4-openssl-dev \
+    libfontconfig1-dev \
+    libfreetype6-dev \
+    libfribidi-dev \
+    libgit2-dev \
+    libharfbuzz-dev \
+    libjpeg-dev \
+    liblapack-dev \
+    libopenblas-dev \
+    libpng-dev \
+    libpython3-dev \
+    libssl-dev \
+    libtiff5-dev \
+    libxml2-dev \
+    openjdk-17-jdk-headless \
+    pkg-config \
+    qpdf \
+    tzdata \
+    software-properties-common \
+    wget \
+    zlib1g-dev
+
+
+# Should keep the installation consistent with 
https://apache.github.io/spark/api/python/getting_started/install.html
+
+# Install Python 3.9
+RUN add-apt-repository ppa:deadsnakes/ppa
+RUN apt-get update && apt-get install -y \
+    python3.9 \
+    python3.9-distutils \
+    && apt-get autoremove --purge -y \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+
+ARG BASIC_PIP_PKGS="pyarrow==11.0.0 pandas==2.2.0 six==1.16.0 numpy scipy 
coverage unittest-xml-reporting"
+# Python deps for Spark Connect
+ARG CONNECT_PIP_PKGS="grpcio==1.67.0 grpcio-status==1.67.0 
googleapis-common-protos==1.65.0 graphviz==0.20 protobuf"
+
+# Install Python 3.9 packages
+RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.9
+RUN python3.9 -m pip install --force $BASIC_PIP_PKGS $CONNECT_PIP_PKGS && \
+    python3.9 -m pip cache purge
diff --git a/python/pyspark/pandas/tests/io/test_io.py 
b/python/pyspark/pandas/tests/io/test_io.py
index 6fbdc366dd76..da5817b86b98 100644
--- a/python/pyspark/pandas/tests/io/test_io.py
+++ b/python/pyspark/pandas/tests/io/test_io.py
@@ -24,7 +24,12 @@ import pandas as pd
 from pyspark import pandas as ps
 from pyspark.testing.pandasutils import PandasOnSparkTestCase
 from pyspark.testing.sqlutils import SQLTestUtils
-from pyspark.testing.utils import have_tabulate, tabulate_requirement_message
+from pyspark.testing.utils import (
+    have_jinja2,
+    jinja2_requirement_message,
+    have_tabulate,
+    tabulate_requirement_message,
+)
 
 
 # This file contains test cases for 'Serialization / IO / Conversion'
@@ -91,6 +96,7 @@ class FrameIOMixin:
         psdf = ps.DataFrame.from_dict(data, orient="index", columns=["A", "B", 
"C", "D"])
         self.assert_eq(pdf, psdf)
 
+    @unittest.skipIf(not have_jinja2, jinja2_requirement_message)
     def test_style(self):
         # Currently, the `style` function returns a pandas object `Styler` as 
it is,
         # processing only the number of rows declared in `compute.max_rows`.
diff --git a/python/pyspark/pandas/tests/io/test_series_conversion.py 
b/python/pyspark/pandas/tests/io/test_series_conversion.py
index 2ae40e92b489..06d923816633 100644
--- a/python/pyspark/pandas/tests/io/test_series_conversion.py
+++ b/python/pyspark/pandas/tests/io/test_series_conversion.py
@@ -23,6 +23,7 @@ import pandas as pd
 from pyspark import pandas as ps
 from pyspark.testing.pandasutils import PandasOnSparkTestCase
 from pyspark.testing.sqlutils import SQLTestUtils
+from pyspark.testing.utils import have_jinja2, jinja2_requirement_message
 
 
 class SeriesConversionTestsMixin:
@@ -48,6 +49,7 @@ class SeriesConversionTestsMixin:
             psser.to_clipboard(sep=",", index=False), 
pser.to_clipboard(sep=",", index=False)
         )
 
+    @unittest.skipIf(not have_jinja2, jinja2_requirement_message)
     def test_to_latex(self):
         pser = self.pser
         psser = self.psser


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-50712][INFRA][PS][TESTS] Add a daily build for Pandas API on Spark with old dependencies

Reply via email to