This is an automated email from the ASF dual-hosted git repository.
asf-gitbox-commits pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new b67b83ff8773 [SPARK-56763][BUILD] Branch 3.5 restore Python 3.8 & R in
CI (Continuation of Sarutak's PR)
b67b83ff8773 is described below
commit b67b83ff87731c295864a901c87fa79356af1dce
Author: Holden Karau <[email protected]>
AuthorDate: Tue Jun 2 09:58:19 2026 -0700
[SPARK-56763][BUILD] Branch 3.5 restore Python 3.8 & R in CI (Continuation
of Sarutak's PR)
### What changes were proposed in this pull request?
This is a rebase of https://github.com/apache/spark/pull/55740/changes on
the PPA and docker fix
This re-enables R doc build and Py3.8
For type testing to continue to work in Py3.8 it changes how we fall back
on torch import failure given the lack of ongoing 3.8 support by torch..
### Why are the changes needed?
Our R version floats and various things have changed in 4.4 which has
broken CI, similarily many of our dependencies float which broke MyPy type
checking in Python.
Note: I plan to follow up with a seperate PR to pin our R version (in this
branch) back to 4.3 but for now lets fix it (we can also pin to 4.4 if people
prefer but I do want to pin the R version eventually).
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
- `Base image build` workflow passes on GitHub Actions.
- `docker build dev/infra` succeeds locally.
### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Opus 4.6
Closes #55886 from
holdenk/SPARK-56763-sarutak-3.5-restore-additional-functionality-r2.
Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Kousuke Saruta <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Holden Karau <[email protected]>
---
.github/workflows/build_and_test.yml | 3 ++-
R/pkg/R/utils.R | 5 +++++
R/run-tests.sh | 7 +++---
dev/infra/Dockerfile | 26 ++++++++--------------
python/mypy.ini | 15 +++++++++++++
python/pyspark/ml/connect/classification.py | 8 +++++--
.../ml/tests/connect/test_connect_tuning.py | 3 ++-
.../connect/test_legacy_mode_classification.py | 1 +
.../ml/tests/connect/test_legacy_mode_pipeline.py | 2 ++
.../ml/tests/connect/test_legacy_mode_tuning.py | 1 +
.../tests/connect/test_parity_torch_distributor.py | 2 +-
python/pyspark/ml/torch/data.py | 5 ++++-
.../pandas/tests/computation/test_apply_func.py | 5 +++--
.../pandas/tests/plot/test_frame_plot_plotly.py | 3 +++
.../pandas/tests/plot/test_series_plot_plotly.py | 3 +++
python/pyspark/pandas/typedef/typehints.py | 15 ++++++++++++-
python/pyspark/sql/connect/plan.py | 4 ++--
python/run-tests.py | 2 +-
18 files changed, 78 insertions(+), 32 deletions(-)
diff --git a/.github/workflows/build_and_test.yml
b/.github/workflows/build_and_test.yml
index 5213c5277b23..7fa5bdd72974 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -100,7 +100,7 @@ jobs:
\"build\": \"$build\",
\"pyspark\": \"$pyspark\",
\"pyspark-pandas\": \"$pandas\",
- \"sparkr\": \"false\",
+ \"sparkr\": \"$sparkr\",
\"tpcds-1g\": \"$tpcds\",
\"docker-integration-tests\": \"$docker\",
\"scala-213\": \"$build\",
@@ -712,6 +712,7 @@ jobs:
apt-get update -y
apt-get install -y ruby ruby-dev
Rscript -e "install.packages(c('remotes', 'testthat', 'knitr',
'rmarkdown', 'markdown', 'e1071', 'roxygen2', 'ggplot2', 'mvtnorm', 'statmod'),
repos='https://cloud.r-project.org/')"
+ Rscript -e "remotes::install_version('ragg', version='1.2.5',
repos='https://cloud.r-project.org')"
Rscript -e "remotes::install_version('pkgdown', version='2.0.1',
repos='https://cloud.r-project.org')"
Rscript -e "remotes::install_version('preferably', version='0.4',
repos='https://cloud.r-project.org')"
gem install bundler -v 2.4.22
diff --git a/R/pkg/R/utils.R b/R/pkg/R/utils.R
index 2fe8817fdb38..69d44cec8ab7 100644
--- a/R/pkg/R/utils.R
+++ b/R/pkg/R/utils.R
@@ -546,6 +546,11 @@ processClosure <- function(node, oldEnv, defVars,
checkedFuncs, newEnv) {
error = function(e) { FALSE })) {
obj <- get(nodeChar, envir = func.env, inherits = FALSE)
if (is.function(obj)) {
+ if (is.primitive(obj)) {
+ # Primitive functions have no closure to clean.
+ assign(nodeChar, obj, envir = newEnv)
+ break
+ }
# If the node is a function call.
funcList <- mget(nodeChar, envir = checkedFuncs, inherits = F,
ifnotfound = list(list(NULL)))[[1]]
diff --git a/R/run-tests.sh b/R/run-tests.sh
index 90a60eda0387..20442ca89117 100755
--- a/R/run-tests.sh
+++ b/R/run-tests.sh
@@ -58,10 +58,11 @@ if [[ $FAILED != 0 || $NUM_TEST_WARNING != 0 ]]; then
echo -en "\033[0m" # No color
exit -1
else
- # We have 2 NOTEs: for RoxygenNote and one in Jenkins only "No repository
set"
+ # We have 3 NOTEs: for RoxygenNote, one in Jenkins only "No repository
set",
+ # and "Lost braces" in Rd files due to R 4.4+ stricter checkRd
# For non-latest version branches, one WARNING for package version
- if [[ ($NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES
-gt 2) &&
- ($HAS_PACKAGE_VERSION_WARN != 1 || $NUM_CRAN_WARNING != 1 ||
$NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 1) ]]; then
+ if [[ ($NUM_CRAN_WARNING != 0 || $NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES
-gt 3) &&
+ ($HAS_PACKAGE_VERSION_WARN != 1 || $NUM_CRAN_WARNING != 1 ||
$NUM_CRAN_ERROR != 0 || $NUM_CRAN_NOTES -gt 2) ]]; then
cat $CRAN_CHECK_LOG_FILE
echo -en "\033[31m" # Red
echo "Had CRAN check errors; see logs."
diff --git a/dev/infra/Dockerfile b/dev/infra/Dockerfile
index 42637942fa09..0d6b052e1677 100644
--- a/dev/infra/Dockerfile
+++ b/dev/infra/Dockerfile
@@ -19,10 +19,9 @@
# See also in https://hub.docker.com/_/ubuntu
FROM ubuntu:jammy
+ENV FULL_REFRESH_DATE 20260514
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
-ENV FULL_REFRESH_DATE 20260420
-
ENV DEBIAN_FRONTEND noninteractive
ENV DEBCONF_NONINTERACTIVE_SEEN true
@@ -32,15 +31,15 @@ ENV PATH "$PATH:/usr/local/bin"
RUN timeout 5 bash -c 'exec 3<>/dev/tcp/archive.ubuntu.com/80 && printf "HEAD
/ubuntu/ HTTP/1.1\r\nHost: archive.ubuntu.com\r\nConnection: close\r\n\r\n" >&3
&& IFS= read -r s <&3 && [[ "$s" =~ ^HTTP/.*[[:space:]](2|3)[0-9][0-9] ]]' ||
find /etc/apt -type f \( -name '*.list' -o -name '*.sources' \) -exec sed
-i.bak -e 's|archive\.ubuntu\.com|mirror.fcix.net|g' -e
's|security\.ubuntu\.com|mirror.fcix.net|g' {} +
RUN apt-get clean && apt-get update
-RUN PKGS="software-properties-common git libxml2-dev pkg-config curl wget
openjdk-8-jdk libpython3-dev python3-pip python3-setuptools build-essential
gfortran libopenblas-dev liblapack-dev gpg gpg-agent software-properties-common
gcc g++ make libc6-dev libffi-dev libcurl4-openssl-dev libssl-dev openssl
zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev liblzma-dev tk-dev
uuid-dev pandoc libuv1-dev libuv1"; $APT_INSTALL $PKGS || (apt-get update &&
$APT_INSTALL $PKGS)
-RUN update-alternatives --set java
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java
+RUN PKGS="software-properties-common git libxml2-dev libxslt-dev pkg-config
curl wget openjdk-8-jdk libpython3-dev python3-pip python3-setuptools
build-essential gfortran libopenblas-dev liblapack-dev gpg gpg-agent
software-properties-common gcc g++ make libc6-dev libffi-dev
libcurl4-openssl-dev libssl-dev openssl zlib1g-dev libbz2-dev libreadline-dev
libsqlite3-dev liblzma-dev tk-dev uuid-dev pandoc libuv1-dev libuv1";
$APT_INSTALL $PKGS || (apt-get update && $APT_INSTALL $PKGS)
+RUN update-alternatives --set java /usr/lib/jvm/java-8-openjdk-$(dpkg
--print-architecture)/jre/bin/java
# We also want Python 3.8 since that's the oldest supported version for Spark
3.5
# Also ubuntu is under a DDoS so retry adding, and finally fallback to
python.org 3.8 release
RUN ( \
(add-apt-repository -y ppa:deadsnakes/ppa || add-apt-repository -y
ppa:deadsnakes/ppa) && \
(apt-get update || apt-get update) && \
- PKGS="python3.8 python3.9 python3.9-venv python3.8-venv"; ($APT_INSTALL
$PKGS || apt-get update && $APT_INSTALL $PKGS) \
+ PKGS="python3.8 python3.8-dev python3.9 python3.9-venv python3.8-venv";
($APT_INSTALL $PKGS || apt-get update && $APT_INSTALL $PKGS) \
) || \
(PYTHON_VERSION=3.8.20; \
curl -O
https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tgz
&& \
@@ -83,12 +82,8 @@ RUN Rscript -e " \
"
# See more in SPARK-39959, roxygen2 < 7.2.1
-RUN Rscript -e "remotes::install_version('pkgload', version = '1.3.2', repos
= 'https://cloud.r-project.org'); \
- remotes::install_version('pkgbuild', version = '1.4.0', repos =
'https://cloud.r-project.org'); \
- remotes::install_version('desc', version = '1.4.2', repos =
'https://cloud.r-project.org'); \
- remotes::install_version('rlang', version = '1.1.1', repos =
'https://cloud.r-project.org'); \
- remotes::install_version('cli', version = '3.6.1', repos =
'https://cloud.r-project.org'); \
- remotes::install_version('purrr', version = '1.0.1', repos =
'https://cloud.r-project.org')"
+# Let roxygen2's deps float to current so they compile against R 4.6; pin only
roxygen2 itself.
+RUN Rscript -e "install.packages(c('pkgload', 'pkgbuild', 'desc', 'rlang',
'cli', 'purrr'), repos='https://cloud.r-project.org/')"
RUN Rscript -e "remotes::install_version('roxygen2', version='7.2.0',
repos='https://cloud.r-project.org')"
# Sanity check the R install
@@ -106,15 +101,12 @@ ENV R_LIBS_SITE
"/usr/local/lib/R/site-library:${R_LIBS_SITE}:/usr/lib/R/library
RUN python3.8 -m pip install setuptools virtualenv
RUN python3.9 -m pip install setuptools virtualenv
-RUN python3.8 -m pip install --only-binary=pandas numpy pandas 'scipy<1.9'
coverage 'matplotlib==3.7.2' 'mypy==0.982'
-RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3'
'scipy<=1.10' unittest-xml-reporting 'plotly>=4.8' 'mlflow>=2.3.1' coverage
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
'blinker==1.4' 'mypy==0.982'
+RUN python3.9 -m pip install 'numpy==1.25.1' 'pyarrow==12.0.1' 'pandas<=2.0.3'
scipy unittest-xml-reporting 'plotly<6.0' 'mlflow>=2.3.1' coverage
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
'Flask==1.1.2' 'Werkzeug==2.1.2'
+RUN python3.8 -m pip install 'numpy' 'pyarrow==12.0.1' 'pandas<=2.0.3'
'scipy<=1.10' unittest-xml-reporting 'plotly>=4.8' 'mlflow>=2.3.1' coverage
'matplotlib==3.7.2' openpyxl 'memory-profiler==0.60.0' 'scikit-learn==1.1.*'
'blinker==1.4' 'mypy==0.982' 'beniget==0.4.1' 'pyproject-metadata==0.8.1'
# Add Python deps for Spark Connect.
RUN python3.9 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57'
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
+RUN python3.8 -m pip install 'grpcio>=1.48,<1.57' 'grpcio-status>=1.48,<1.57'
'protobuf==3.20.3' 'googleapis-common-protos==1.56.4'
# Add torch as a testing dependency for TorchDistributor
RUN python3.9 -m pip install 'torch==2.0.1' 'torchvision==0.15.2' torcheval
-
-# pyarrow
-RUN python3.9 -m pip install 'pyarrow<13.0.0'
-RUN python3.8 -m pip install 'pyarrow<13.0.0'
diff --git a/python/mypy.ini b/python/mypy.ini
index ef0ee36ef854..b8c3ca6b0313 100644
--- a/python/mypy.ini
+++ b/python/mypy.ini
@@ -166,6 +166,21 @@ ignore_missing_imports = True
[mypy-grpc.*]
ignore_missing_imports = True
+[mypy-grpc_status.*]
+ignore_missing_imports = True
+
+[mypy-google.*]
+ignore_missing_imports = True
+
+[mypy-IPython.*]
+ignore_missing_imports = True
+
+[mypy-tornado.*]
+ignore_missing_imports = True
+
+[mypy-xmlrunner.*]
+ignore_missing_imports = True
+
; pydantic is pulled in transitively (e.g. via mlflow). mypy has issues
; serializing pydantic v2's recursive JsonValue type, so skip following it.
[mypy-pydantic.*]
diff --git a/python/pyspark/ml/connect/classification.py
b/python/pyspark/ml/connect/classification.py
index f8b525db8edd..33a7d09e9b82 100644
--- a/python/pyspark/ml/connect/classification.py
+++ b/python/pyspark/ml/connect/classification.py
@@ -43,8 +43,12 @@ from pyspark.ml.connect.base import Predictor,
PredictionModel
from pyspark.ml.connect.io_utils import ParamsReadWrite, CoreModelReadWrite
from pyspark.sql.functions import lit, count, countDistinct
-import torch
-import torch.nn as torch_nn
+try:
+ import torch
+ import torch.nn as torch_nn
+except ImportError:
+ torch = None # type: ignore[assignment]
+ torch_nn = None # type: ignore[assignment]
class _LogisticRegressionParams(
diff --git a/python/pyspark/ml/tests/connect/test_connect_tuning.py
b/python/pyspark/ml/tests/connect/test_connect_tuning.py
index 901367e44d20..7ca1812e3d28 100644
--- a/python/pyspark/ml/tests/connect/test_connect_tuning.py
+++ b/python/pyspark/ml/tests/connect/test_connect_tuning.py
@@ -18,9 +18,10 @@
import os
import unittest
from pyspark.sql import SparkSession
-from pyspark.ml.tests.connect.test_legacy_mode_tuning import
CrossValidatorTestsMixin
+from pyspark.ml.tests.connect.test_legacy_mode_tuning import
CrossValidatorTestsMixin, have_torch
[email protected](not have_torch, "torch is required")
@unittest.skipIf("SPARK_SKIP_CONNECT_COMPAT_TESTS" in os.environ, "Requires
JVM access")
class CrossValidatorTestsOnConnect(CrossValidatorTestsMixin,
unittest.TestCase):
def setUp(self) -> None:
diff --git a/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
b/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
index 84d5829122af..5601d6bfffbf 100644
--- a/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
+++ b/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py
@@ -218,6 +218,7 @@ class ClassificationTestsMixin:
loaded_model.transform(eval_df1.toPandas())
[email protected](not have_torch, "torch is required")
class ClassificationTests(ClassificationTestsMixin, unittest.TestCase):
def setUp(self) -> None:
self.spark = SparkSession.builder.master("local[2]").getOrCreate()
diff --git a/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
b/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
index 5fd4f6f16cfa..bb47f9a7f0b2 100644
--- a/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
+++ b/python/pyspark/ml/tests/connect/test_legacy_mode_pipeline.py
@@ -43,6 +43,7 @@ class PipelineTestsMixin:
rtol=1e-1,
)
+ @unittest.skipIf(not have_torch, "torch is required")
def test_pipeline(self):
train_dataset = self.spark.createDataFrame(
[
@@ -164,6 +165,7 @@ class PipelineTestsMixin:
assert lorv2.getOrDefault(lorv2.maxIter) == 200
[email protected](not have_torch, "torch is required")
class PipelineTests(PipelineTestsMixin, unittest.TestCase):
def setUp(self) -> None:
self.spark = SparkSession.builder.master("local[2]").getOrCreate()
diff --git a/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
b/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
index 0ade227540c7..302deb556212 100644
--- a/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
+++ b/python/pyspark/ml/tests/connect/test_legacy_mode_tuning.py
@@ -272,6 +272,7 @@ class CrossValidatorTestsMixin:
cv.fit(train_dataset)
[email protected](not have_torch, "torch is required")
class CrossValidatorTests(CrossValidatorTestsMixin, unittest.TestCase):
def setUp(self) -> None:
self.spark = SparkSession.builder.master("local[2]").getOrCreate()
diff --git a/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
b/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
index 238775ded2a2..a8b4b06c450f 100644
--- a/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
+++ b/python/pyspark/ml/tests/connect/test_parity_torch_distributor.py
@@ -81,7 +81,7 @@ class TorchDistributorLocalUnitTestsOnConnect(
]
[email protected]("SPARK_SKIP_CONNECT_COMPAT_TESTS" in os.environ, "Requires
JVM access")
[email protected](not have_torch, "torch is required")
class TorchDistributorLocalUnitTestsIIOnConnect(
TorchDistributorLocalUnitTestsMixin, unittest.TestCase
):
diff --git a/python/pyspark/ml/torch/data.py b/python/pyspark/ml/torch/data.py
index 0a5597fbd241..cb7e7f1b68ac 100644
--- a/python/pyspark/ml/torch/data.py
+++ b/python/pyspark/ml/torch/data.py
@@ -15,7 +15,10 @@
# limitations under the License.
#
-import torch
+try:
+ import torch
+except ImportError:
+ torch = None # type: ignore[assignment]
import numpy as np
from typing import Any, Callable, Iterator
from pyspark.sql.types import StructType
diff --git a/python/pyspark/pandas/tests/computation/test_apply_func.py
b/python/pyspark/pandas/tests/computation/test_apply_func.py
index 37cc4a4188f6..f169460d0ed5 100644
--- a/python/pyspark/pandas/tests/computation/test_apply_func.py
+++ b/python/pyspark/pandas/tests/computation/test_apply_func.py
@@ -253,8 +253,9 @@ class FrameApplyFunctionMixin:
actual.columns = ["a", "b"]
self.assert_eq(actual, pdf)
- # For NumPy typing, NumPy version should be 1.21+ and Python version
should be 3.8+
- if sys.version_info >= (3, 8) and LooseVersion(np.__version__) >=
LooseVersion("1.21"):
+ # For NumPy typing, NumPy version should be 1.21+ and Python version
should be 3.9+
+ # (types.GenericAlias, used by ntp.NDArray, was added in Python 3.9)
+ if sys.version_info >= (3, 9) and LooseVersion(np.__version__) >=
LooseVersion("1.21"):
import numpy.typing as ntp
psdf = ps.from_pandas(pdf)
diff --git a/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
b/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
index 37469db2c8f5..56a70f925f97 100644
--- a/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
+++ b/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
@@ -15,6 +15,7 @@
# limitations under the License.
#
+import sys
import unittest
import pprint
@@ -193,6 +194,7 @@ class DataFramePlotPlotlyTestsMixin:
self.assertEqual(plt.layout.title.text, "Title")
self.assertFalse(hasattr(plt.layout, "foo"))
+ @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision
differs on Python 3.8")
def test_hist_plot(self):
def check_hist_plot(psdf):
bins = np.array([1.0, 5.9, 10.8, 15.7, 20.6, 25.5, 30.4, 35.3,
40.2, 45.1, 50.0])
@@ -240,6 +242,7 @@ class DataFramePlotPlotlyTestsMixin:
psdf1.columns = columns
check_hist_plot(psdf1)
+ @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision
differs on Python 3.8")
def test_kde_plot(self):
psdf = ps.DataFrame({"a": [1, 2, 3, 4, 5], "b": [1, 3, 5, 7, 9], "c":
[2, 4, 6, 8, 10]})
diff --git a/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
b/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
index 1aa175f9308a..49a141676a8b 100644
--- a/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
+++ b/python/pyspark/pandas/tests/plot/test_series_plot_plotly.py
@@ -15,6 +15,7 @@
# limitations under the License.
#
+import sys
import unittest
import pprint
@@ -139,6 +140,7 @@ class SeriesPlotPlotlyTestsMixin:
# psdf["a"].plot(kind="pie"), express.pie(pdf,
values=pdf.columns[0], names=pdf.index),
# )
+ @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision
differs on Python 3.8")
def test_hist_plot(self):
def check_hist_plot(psser):
bins = np.array([1.0, 5.9, 10.8, 15.7, 20.6, 25.5, 30.4, 35.3,
40.2, 45.1, 50.0])
@@ -213,6 +215,7 @@ class SeriesPlotPlotlyTestsMixin:
self.psdf1.a.plot.box(notched=True)
self.psdf1.a.plot.box(hovertext="abc") # other arguments should not
throw an exception
+ @unittest.skipIf(sys.version_info < (3, 9), "Plotly float precision
differs on Python 3.8")
def test_kde_plot(self):
psdf = ps.DataFrame({"a": [1, 2, 3, 4, 5]})
pdf = pd.DataFrame(
diff --git a/python/pyspark/pandas/typedef/typehints.py
b/python/pyspark/pandas/typedef/typehints.py
index 7a23ff6b5018..08874eadcaa1 100644
--- a/python/pyspark/pandas/typedef/typehints.py
+++ b/python/pyspark/pandas/typedef/typehints.py
@@ -794,8 +794,21 @@ def _new_type_holders(
for param in params
)
if sys.version_info < (3, 11):
+ # types.GenericAlias (e.g. numpy.ndarray[Any, dtype[int]]) is iterable
but is a
+ # valid type hint. Use getattr so this still imports cleanly on Python
3.8 where
+ # types.GenericAlias doesn't exist.
+ import types as _types_mod
+
+ _builtin_generic_alias: type = getattr(_types_mod, "GenericAlias",
type(None))
+ _typing_private_generic_alias: type = getattr(typing, "_GenericAlias",
type(None))
is_unnamed_params = all(
- not isinstance(param, slice) and not isinstance(param, Iterable)
for param in params
+ not isinstance(param, slice)
+ and (
+ not isinstance(param, Iterable)
+ or isinstance(param, _builtin_generic_alias)
+ or isinstance(param, _typing_private_generic_alias)
+ )
+ for param in params
)
else:
# PEP 646 changes `GenericAlias` instances into iterable ones at
Python 3.11.
diff --git a/python/pyspark/sql/connect/plan.py
b/python/pyspark/sql/connect/plan.py
index 43af8bb427a5..b25b1be86495 100644
--- a/python/pyspark/sql/connect/plan.py
+++ b/python/pyspark/sql/connect/plan.py
@@ -1613,8 +1613,8 @@ class WriteOperationV2(LogicalPlan):
self.table_name: Optional[str] = table_name
self.provider: Optional[str] = None
self.partitioning_columns: List["ColumnOrName"] = []
- self.options: dict[str, Optional[str]] = {}
- self.table_properties: dict[str, Optional[str]] = {}
+ self.options: Dict[str, Optional[str]] = {}
+ self.table_properties: Dict[str, Optional[str]] = {}
self.mode: Optional[str] = None
self.overwrite_condition: Optional["ColumnOrName"] = None
diff --git a/python/run-tests.py b/python/run-tests.py
index ca8ddb5ff863..6e4a1da18a38 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -207,7 +207,7 @@ def run_individual_python_test(target_dir, test_name,
pyspark_python, keep_test_
def get_default_python_executables():
- python_execs = [x for x in ["python3.9", "pypy3"] if which(x)]
+ python_execs = [x for x in ["python3.9", "python3.8", "pypy3"] if which(x)]
if "python3.9" not in python_execs:
p = which("python3")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]