(spark) branch branch-3.5 updated: [SPARK-56763][BUILD] Branch 3.5 restore Python 3.8 & R in CI (Continuation of Sarutak's PR)

holden Tue, 02 Jun 2026 09:58:38 -0700

This is an automated email from the ASF dual-hosted git repository.

asf-gitbox-commits pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.5 by this push:
     new b67b83ff8773 [SPARK-56763][BUILD] Branch 3.5 restore Python 3.8 & R in 
CI (Continuation of Sarutak's PR)
b67b83ff8773 is described below

commit b67b83ff87731c295864a901c87fa79356af1dce
Author: Holden Karau <[email protected]>
AuthorDate: Tue Jun 2 09:58:19 2026 -0700

    [SPARK-56763][BUILD] Branch 3.5 restore Python 3.8 & R in CI (Continuation 
of Sarutak's PR)
    
    ### What changes were proposed in this pull request?
    
    This is a rebase of https://github.com/apache/spark/pull/55740/changes on 
the PPA and docker fix
    
    This re-enables R doc build and Py3.8
    
    For type testing to continue to work in Py3.8 it changes how we fall back 
on torch import failure given the lack of ongoing 3.8 support by torch..
    
    ### Why are the changes needed?
    
    Our R version floats and various things have changed in 4.4 which has 
broken CI, similarily many of our dependencies float which broke MyPy type 
checking in Python.
    
    Note: I plan to follow up with a seperate PR to pin our R version (in this 
branch) back to 4.3 but for now lets fix it (we can also pin to 4.4 if people 
prefer but I do want to pin the R version eventually).
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    - `Base image build` workflow passes on GitHub Actions.
    - `docker build dev/infra` succeeds locally.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    Kiro CLI / Opus 4.6
    
    Closes #55886 from 
holdenk/SPARK-56763-sarutak-3.5-restore-additional-functionality-r2.
    
    Lead-authored-by: Holden Karau <[email protected]>
    Co-authored-by: Kousuke Saruta <[email protected]>
    Co-authored-by: Holden Karau <[email protected]>
    Signed-off-by: Holden Karau <[email protected]>
---
 .github/workflows/build_and_test.yml               |  3 ++-
 R/pkg/R/utils.R                                    |  5 +++++
 R/run-tests.sh                                     |  7 +++---
 dev/infra/Dockerfile                               | 26 ++++++++--------------
 python/mypy.ini                                    | 15 +++++++++++++
 python/pyspark/ml/connect/classification.py        |  8 +++++--
 .../ml/tests/connect/test_connect_tuning.py        |  3 ++-
 .../connect/test_legacy_mode_classification.py     |  1 +
 .../ml/tests/connect/test_legacy_mode_pipeline.py  |  2 ++
 .../ml/tests/connect/test_legacy_mode_tuning.py    |  1 +
 .../tests/connect/test_parity_torch_distributor.py |  2 +-
 python/pyspark/ml/torch/data.py                    |  5 ++++-
 .../pandas/tests/computation/test_apply_func.py    |  5 +++--
 .../pandas/tests/plot/test_frame_plot_plotly.py    |  3 +++
 .../pandas/tests/plot/test_series_plot_plotly.py   |  3 +++
 python/pyspark/pandas/typedef/typehints.py         | 15 ++++++++++++-
 python/pyspark/sql/connect/plan.py                 |  4 ++--
 python/run-tests.py                                |  2 +-
 18 files changed, 78 insertions(+), 32 deletions(-)

diff --git a/.github/workflows/build_and_test.yml 
b/.github/workflows/build_and_test.yml
index 5213c5277b23..7fa5bdd72974 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -100,7 +100,7 @@ jobs:
               \"build\": \"$build\",
               \"pyspark\": \"$pyspark\",
               \"pyspark-pandas\": \"$pandas\",
-              \"sparkr\": \"false\",
+              \"sparkr\": \"$sparkr\",
               \"tpcds-1g\": \"$tpcds\",
               \"docker-integration-tests\": \"$docker\",
               \"scala-213\": \"$build\",
@@ -712,6 +712,7 @@ jobs:
         apt-get update -y
         apt-get install -y ruby ruby-dev
         Rscript -e "install.packages(c('remotes', 'testthat', 'knitr', 
'rmarkdown', 'markdown', 'e1071', 'roxygen2', 'ggplot2', 'mvtnorm', 'statmod'), 
repos='https://cloud.r-project.org/')"
+        Rscript -e "remotes::install_version('ragg', version='1.2.5', 
repos='https://cloud.r-project.org')"
         Rscript -e "remotes::install_version('pkgdown', version='2.0.1', 
repos='https://cloud.r-project.org')"
         Rscript -e "remotes::install_version('preferably', version='0.4', 
repos='https://cloud.r-project.org')"
         gem install bundler -v 2.4.22
diff --git a/R/pkg/R/utils.R b/R/pkg/R/utils.R
index 2fe8817fdb38..69d44cec8ab7 100644
--- a/R/pkg/R/utils.R
+++ b/R/pkg/R/utils.R
@@ -546,6 +546,11 @@ processClosure <- function(node, oldEnv, defVars, 
checkedFuncs, newEnv) {
                        error = function(e) { FALSE })) {
             obj <- get(nodeChar, envir = func.env, inherits = FALSE)
             if (is.function(obj)) {
+              if (is.primitive(obj)) {
+                # Primitive functions have no closure to clean.
+                assign(nodeChar, obj, envir = newEnv)
+                break
+              }
               # If the node is a function call.
               funcList <- mget(nodeChar, envir = checkedFuncs, inherits = F,
                                ifnotfound = list(list(NULL)))[[1]]
diff --git a/R/run-tests.sh b/R/run-tests.sh
index 90a60eda0387..20442ca89117 100755
--- a/R/run-tests.sh
+++ b/R/run-tests.sh
@@ -58,10 +58,11 @@ if [[ $FAILED != 0 || $NUM_TEST_WARNING != 0 ]]; then
     echo -en "\033[0m"  # No color
     exit -1
 else
-    # We have 2 NOTEs: for RoxygenNote and one in Jenkins only "No repository 
set"
+    # We have 3 NOTEs: for RoxygenNote, one in Jenkins only "No repository 
set",
+    # and "Lost braces" in Rd files due to R 4.4+ stricter checkRd
     # For non-latest version branches, one WARNING for package version
-    if [[ ($NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES 
-gt 2) &&
-          ($HAS_PACKAGE_VERSION_WARN != 1 || $NUM_CRAN_WARNING != 1 || 
$NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 1) ]]; then
+    if [[ ($NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES 
-gt 3) &&
+          ($HAS_PACKAGE_VERSION_WARN != 1 || $NUM_CRAN_WARNING != 1 || 
$NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 2) ]]; then
       cat $CRAN_CHECK_LOG_FILE
       echo -en "\033[31m"  # Red
       echo "Had CRAN check errors; see logs."
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 42637942fa09..0d6b052e1677 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -19,10 +19,9 @@
 # See also in https://hub.docker.com/_/ubuntu
 FROM ubuntu:jammy
 
+ENV FULL_REFRESH_DATE 20260514
 SHELL ["/bin/bash", "-o", "pipefail", "-c"]
 
-ENV FULL_REFRESH_DATE 20260420
-
 ENV DEBIAN_FRONTEND noninteractive
 ENV DEBCONF_NONINTERACTIVE_SEEN true
 
@@ -32,15 +31,15 @@ ENV PATH "$PATH:/usr/local/bin"
 
 RUN timeout 5 bash -c 'exec 3<>/dev/tcp/archive.ubuntu.com/80 && printf "HEAD 
/ubuntu/ HTTP/1.1\r\nHost: archive.ubuntu.com\r\nConnection: close\r\n\r\n" >&3 
&& IFS= read -r s <&3 && [[ "$s" =~ ^HTTP/.*[[:space:]](2|3)[0-9][0-9] ]]' || 
find /etc/apt -type f \( -name '*.list' -o -name '*.sources' \) -exec sed 
-i.bak -e 's|archive\.ubuntu\.com|mirror.fcix.net|g' -e 
's|security\.ubuntu\.com|mirror.fcix.net|g' {} +
 RUN apt-get clean && apt-get update
-RUN PKGS="software-properties-common git libxml2-dev pkg-config curl wget 
openjdk-8-jdk libpython3-dev python3-pip python3-setuptools build-essential 
gfortran libopenblas-dev liblapack-dev gpg gpg-agent software-properties-common 
gcc g++ make libc6-dev libffi-dev libcurl4-openssl-dev libssl-dev openssl 
zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev liblzma-dev tk-dev 
uuid-dev pandoc libuv1-dev libuv1"; $APT_INSTALL $PKGS || (apt-get update && 
$APT_INSTALL $PKGS)
-RUN update-alternatives --set java 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
+RUN PKGS="software-properties-common git libxml2-dev libxslt-dev pkg-config 
curl wget openjdk-8-jdk libpython3-dev python3-pip python3-setuptools 
build-essential gfortran libopenblas-dev liblapack-dev gpg gpg-agent 
software-properties-common gcc g++ make libc6-dev libffi-dev 
libcurl4-openssl-dev libssl-dev openssl zlib1g-dev libbz2-dev libreadline-dev 
libsqlite3-dev liblzma-dev tk-dev uuid-dev pandoc libuv1-dev libuv1"; 
$APT_INSTALL $PKGS || (apt-get update && $APT_INSTALL $PKGS)
+RUN update-alternatives --set java /usr/lib/jvm/java-8-openjdk-$(dpkg 
--print-architecture)/jre/bin/java
 
 # We also want Python 3.8 since that's the oldest supported version for Spark 
3.5
 # Also ubuntu is under a DDoS so retry adding, and finally fallback to 
python.org 3.8 release
 RUN ( \
     (add-apt-repository -y ppa:deadsnakes/ppa || add-apt-repository -y 
ppa:deadsnakes/ppa) && \
     (apt-get update || apt-get update) && \
-    PKGS="python3.8 python3.9 python3.9-venv python3.8-venv"; ($APT_INSTALL 
$PKGS || apt-get update && $APT_INSTALL $PKGS) \
+    PKGS="python3.8 python3.8-dev python3.9 python3.9-venv python3.8-venv"; 
($APT_INSTALL $PKGS || apt-get update && $APT_INSTALL $PKGS) \
     ) || \
     (PYTHON_VERSION=3.8.20; \
     curl -O 
https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz
 && \
@@ -83,12 +82,8 @@ RUN Rscript -e " \
     "
 
 # See more in SPARK-39959, roxygen2 < 7.2.1
-RUN Rscript -e "remotes::install_version('pkgload',  version = '1.3.2', repos 
= 'https://cloud.r-project.org'); \
-    remotes::install_version('pkgbuild', version = '1.4.0', repos = 
'https://cloud.r-project.org'); \
-    remotes::install_version('desc',     version = '1.4.2', repos = 
'https://cloud.r-project.org'); \
-    remotes::install_version('rlang',    version = '1.1.1', repos = 
'https://cloud.r-project.org'); \
-    remotes::install_version('cli',      version = '3.6.1', repos = 
'https://cloud.r-project.org'); \
-    remotes::install_version('purrr',    version = '1.0.1', repos = 
'https://cloud.r-project.org')"
+# Let roxygen2's deps float to current so they compile against R 4.6; pin only 
roxygen2 itself.
+RUN Rscript -e "install.packages(c('pkgload', 'pkgbuild', 'desc', 'rlang', 
'cli', 'purrr'), repos='https://cloud.r-project.org/')"
 RUN Rscript -e "remotes::install_version('roxygen2', version='7.2.0', 
repos='https://cloud.r-project.org')"
 
 # Sanity check the R install
@@ -106,15 +101,12 @@ ENV R_LIBS_SITE 
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library
 RUN python3.8 -m pip install setuptools virtualenv
 RUN python3.9 -m pip install setuptools virtualenv
 
-RUN python3.8 -m pip  install --only-binary=pandas numpy pandas 'scipy<1.9' 
coverage 'matplotlib==3.7.2' 'mypy==0.982'
-RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' 
'scipy<=1.10' unittest-xml-reporting 'plotly>=4.8' 'mlflow>=2.3.1' coverage 
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' 
'blinker==1.4' 'mypy==0.982'
+RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3' 
scipy unittest-xml-reporting 'plotly<6.0' 'mlflow>=2.3.1' coverage 
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' 
'Flask==1.1.2' 'Werkzeug==2.1.2'
+RUN python3.8 -m pip install 'numpy' 'pyarrow==12.0.1' 'pandas<=2.0.3' 
'scipy<=1.10' unittest-xml-reporting 'plotly>=4.8' 'mlflow>=2.3.1' coverage 
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*' 
'blinker==1.4' 'mypy==0.982' 'beniget==0.4.1' 'pyproject-metadata==0.8.1'
 
 # Add Python deps for Spark Connect.
 RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
+RUN python3.8 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57' 
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
 
 # Add torch as a testing dependency for TorchDistributor
 RUN python3.9 -m pip install 'torch==2.0.1' 'torchvision==0.15.2' torcheval
-
-# pyarrow
-RUN python3.9 -m pip install 'pyarrow<13.0.0'
-RUN python3.8 -m pip install 'pyarrow<13.0.0'
diff --git a/python/mypy.ini b/python/mypy.ini
index ef0ee36ef854..b8c3ca6b0313 100644
--- a/python/mypy.ini
+++ b/python/mypy.ini
@@ -166,6 +166,21 @@ ignore_missing_imports = True
 [mypy-grpc.*]
 ignore_missing_imports = True
 
+[mypy-grpc_status.*]
+ignore_missing_imports = True
+
+[mypy-google.*]
+ignore_missing_imports = True
+
+[mypy-IPython.*]
+ignore_missing_imports = True
+
+[mypy-tornado.*]
+ignore_missing_imports = True
+
+[mypy-xmlrunner.*]
+ignore_missing_imports = True
+
 ; pydantic is pulled in transitively (e.g. via mlflow). mypy has issues
 ; serializing pydantic v2's recursive JsonValue type, so skip following it.
 [mypy-pydantic.*]
diff --git a/python/pyspark/ml/connect/classification.py 
b/python/pyspark/ml/connect/classification.py
index f8b525db8edd..33a7d09e9b82 100644
--- a/python/pyspark/ml/connect/classification.py
+++ b/python/pyspark/ml/connect/classification.py
@@ -43,8 +43,12 @@ from pyspark.ml.connect.base import Predictor, 
PredictionModel
 from pyspark.ml.connect.io_utils import ParamsReadWrite, CoreModelReadWrite
 from pyspark.sql.functions import lit, count, countDistinct
 
-import torch
-import torch.nn as torch_nn
+try:
+    import torch
+    import torch.nn as torch_nn
+except ImportError:
+    torch = None  # type: ignore[assignment]
+    torch_nn = None  # type: ignore[assignment]
 
 
 class _LogisticRegressionParams(
diff --git a/python/pyspark/ml/tests/connect/test_connect_tuning.py 
b/python/pyspark/ml/tests/connect/test_connect_tuning.py
index 901367e44d20..7ca1812e3d28 100644
--- a/python/pyspark/ml/tests/connect/test_connect_tuning.py
+++ b/python/pyspark/ml/tests/connect/test_connect_tuning.py
@@ -18,9 +18,10 @@
 import os
 import unittest
 from pyspark.sql import SparkSession
-from pyspark.ml.tests.connect.test_legacy_mode_tuning import 
CrossValidatorTestsMixin
+from pyspark.ml.tests.connect.test_legacy_mode_tuning import 
CrossValidatorTestsMixin, have_torch
 
 
[email protected](not have_torch, "torch is required")
 @unittest.skipIf("SPARK_SKIP_CONNECT_COMPAT_TESTS" in os.environ, "Requires 
JVM access")
 class CrossValidatorTestsOnConnect(CrossValidatorTestsMixin, 
unittest.TestCase):
     def setUp(self) -> None:
diff --git a/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py 
b/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
index 84d5829122af..5601d6bfffbf 100644
--- a/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
+++ b/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
@@ -218,6 +218,7 @@ class ClassificationTestsMixin:
             loaded_model.transform(eval_df1.toPandas())
 
 
[email protected](not have_torch, "torch is required")
 class ClassificationTests(ClassificationTestsMixin, unittest.TestCase):
     def setUp(self) -> None:
         self.spark = SparkSession.builder.master("local[2]").getOrCreate()
diff --git a/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py 
b/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
index 5fd4f6f16cfa..bb47f9a7f0b2 100644
--- a/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
+++ b/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
@@ -43,6 +43,7 @@ class PipelineTestsMixin:
                 rtol=1e-1,
             )
 
+    @unittest.skipIf(not have_torch, "torch is required")
     def test_pipeline(self):
         train_dataset = self.spark.createDataFrame(
             [
@@ -164,6 +165,7 @@ class PipelineTestsMixin:
         assert lorv2.getOrDefault(lorv2.maxIter) == 200
 
 
[email protected](not have_torch, "torch is required")
 class PipelineTests(PipelineTestsMixin, unittest.TestCase):
     def setUp(self) -> None:
         self.spark = SparkSession.builder.master("local[2]").getOrCreate()
diff --git a/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py 
b/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
index 0ade227540c7..302deb556212 100644
--- a/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
+++ b/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
@@ -272,6 +272,7 @@ class CrossValidatorTestsMixin:
         cv.fit(train_dataset)
 
 
[email protected](not have_torch, "torch is required")
 class CrossValidatorTests(CrossValidatorTestsMixin, unittest.TestCase):
     def setUp(self) -> None:
         self.spark = SparkSession.builder.master("local[2]").getOrCreate()
diff --git a/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py 
b/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
index 238775ded2a2..a8b4b06c450f 100644
--- a/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
+++ b/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
@@ -81,7 +81,7 @@ class TorchDistributorLocalUnitTestsOnConnect(
         ]
 
 
[email protected]("SPARK_SKIP_CONNECT_COMPAT_TESTS" in os.environ, "Requires 
JVM access")
[email protected](not have_torch, "torch is required")
 class TorchDistributorLocalUnitTestsIIOnConnect(
     TorchDistributorLocalUnitTestsMixin, unittest.TestCase
 ):
diff --git a/python/pyspark/ml/torch/data.py b/python/pyspark/ml/torch/data.py
index 0a5597fbd241..cb7e7f1b68ac 100644
--- a/python/pyspark/ml/torch/data.py
+++ b/python/pyspark/ml/torch/data.py
@@ -15,7 +15,10 @@
 # limitations under the License.
 #
 
-import torch
+try:
+    import torch
+except ImportError:
+    torch = None  # type: ignore[assignment]
 import numpy as np
 from typing import Any, Callable, Iterator
 from pyspark.sql.types import StructType
diff --git a/python/pyspark/pandas/tests/computation/test_apply_func.py 
b/python/pyspark/pandas/tests/computation/test_apply_func.py
index 37cc4a4188f6..f169460d0ed5 100644
--- a/python/pyspark/pandas/tests/computation/test_apply_func.py
+++ b/python/pyspark/pandas/tests/computation/test_apply_func.py
@@ -253,8 +253,9 @@ class FrameApplyFunctionMixin:
         actual.columns = ["a", "b"]
         self.assert_eq(actual, pdf)
 
-        # For NumPy typing, NumPy version should be 1.21+ and Python version 
should be 3.8+
-        if sys.version_info >= (3, 8) and LooseVersion(np.__version__) >= 
LooseVersion("1.21"):
+        # For NumPy typing, NumPy version should be 1.21+ and Python version 
should be 3.9+
+        # (types.GenericAlias, used by ntp.NDArray, was added in Python 3.9)
+        if sys.version_info >= (3, 9) and LooseVersion(np.__version__) >= 
LooseVersion("1.21"):
             import numpy.typing as ntp
 
             psdf = ps.from_pandas(pdf)
diff --git a/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py 
b/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
index 37469db2c8f5..56a70f925f97 100644
--- a/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
+++ b/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 
+import sys
 import unittest
 import pprint
 
@@ -193,6 +194,7 @@ class DataFramePlotPlotlyTestsMixin:
         self.assertEqual(plt.layout.title.text, "Title")
         self.assertFalse(hasattr(plt.layout, "foo"))
 
+    @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision 
differs on Python 3.8")
     def test_hist_plot(self):
         def check_hist_plot(psdf):
             bins = np.array([1.0, 5.9, 10.8, 15.7, 20.6, 25.5, 30.4, 35.3, 
40.2, 45.1, 50.0])
@@ -240,6 +242,7 @@ class DataFramePlotPlotlyTestsMixin:
         psdf1.columns = columns
         check_hist_plot(psdf1)
 
+    @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision 
differs on Python 3.8")
     def test_kde_plot(self):
         psdf = ps.DataFrame({"a": [1, 2, 3, 4, 5], "b": [1, 3, 5, 7, 9], "c": 
[2, 4, 6, 8, 10]})
 
diff --git a/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py 
b/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
index 1aa175f9308a..49a141676a8b 100644
--- a/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
+++ b/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
@@ -15,6 +15,7 @@
 # limitations under the License.
 #
 
+import sys
 import unittest
 import pprint
 
@@ -139,6 +140,7 @@ class SeriesPlotPlotlyTestsMixin:
         #     psdf["a"].plot(kind="pie"), express.pie(pdf, 
values=pdf.columns[0], names=pdf.index),
         # )
 
+    @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision 
differs on Python 3.8")
     def test_hist_plot(self):
         def check_hist_plot(psser):
             bins = np.array([1.0, 5.9, 10.8, 15.7, 20.6, 25.5, 30.4, 35.3, 
40.2, 45.1, 50.0])
@@ -213,6 +215,7 @@ class SeriesPlotPlotlyTestsMixin:
             self.psdf1.a.plot.box(notched=True)
         self.psdf1.a.plot.box(hovertext="abc")  # other arguments should not 
throw an exception
 
+    @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision 
differs on Python 3.8")
     def test_kde_plot(self):
         psdf = ps.DataFrame({"a": [1, 2, 3, 4, 5]})
         pdf = pd.DataFrame(
diff --git a/python/pyspark/pandas/typedef/typehints.py 
b/python/pyspark/pandas/typedef/typehints.py
index 7a23ff6b5018..08874eadcaa1 100644
--- a/python/pyspark/pandas/typedef/typehints.py
+++ b/python/pyspark/pandas/typedef/typehints.py
@@ -794,8 +794,21 @@ def _new_type_holders(
         for param in params
     )
     if sys.version_info < (3, 11):
+        # types.GenericAlias (e.g. numpy.ndarray[Any, dtype[int]]) is iterable 
but is a
+        # valid type hint. Use getattr so this still imports cleanly on Python 
3.8 where
+        # types.GenericAlias doesn't exist.
+        import types as _types_mod
+
+        _builtin_generic_alias: type = getattr(_types_mod, "GenericAlias", 
type(None))
+        _typing_private_generic_alias: type = getattr(typing, "_GenericAlias", 
type(None))
         is_unnamed_params = all(
-            not isinstance(param, slice) and not isinstance(param, Iterable) 
for param in params
+            not isinstance(param, slice)
+            and (
+                not isinstance(param, Iterable)
+                or isinstance(param, _builtin_generic_alias)
+                or isinstance(param, _typing_private_generic_alias)
+            )
+            for param in params
         )
     else:
         # PEP 646 changes `GenericAlias` instances into iterable ones at 
Python 3.11.
diff --git a/python/pyspark/sql/connect/plan.py 
b/python/pyspark/sql/connect/plan.py
index 43af8bb427a5..b25b1be86495 100644
--- a/python/pyspark/sql/connect/plan.py
+++ b/python/pyspark/sql/connect/plan.py
@@ -1613,8 +1613,8 @@ class WriteOperationV2(LogicalPlan):
         self.table_name: Optional[str] = table_name
         self.provider: Optional[str] = None
         self.partitioning_columns: List["ColumnOrName"] = []
-        self.options: dict[str, Optional[str]] = {}
-        self.table_properties: dict[str, Optional[str]] = {}
+        self.options: Dict[str, Optional[str]] = {}
+        self.table_properties: Dict[str, Optional[str]] = {}
         self.mode: Optional[str] = None
         self.overwrite_condition: Optional["ColumnOrName"] = None
 
diff --git a/python/run-tests.py b/python/run-tests.py
index ca8ddb5ff863..6e4a1da18a38 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -207,7 +207,7 @@ def run_individual_python_test(target_dir, test_name, 
pyspark_python, keep_test_
 
 
 def get_default_python_executables():
-    python_execs = [x for x in ["python3.9", "pypy3"] if which(x)]
+    python_execs = [x for x in ["python3.9", "python3.8", "pypy3"] if which(x)]
 
     if "python3.9" not in python_execs:
         p = which("python3")


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch branch-3.5 updated: [SPARK-56763][BUILD] Branch 3.5 restore Python 3.8 & R in CI (Continuation of Sarutak's PR)

Reply via email to