This is an automated email from the ASF dual-hosted git repository.
ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 94cfa3d0e431 [SPARK-55132][INFRA] Upgrade numpy version on lint image
94cfa3d0e431 is described below
commit 94cfa3d0e431522fb14d975b122a2907b885a163
Author: Tian Gao <[email protected]>
AuthorDate: Fri Jan 23 21:58:31 2026 +0800
[SPARK-55132][INFRA] Upgrade numpy version on lint image
### What changes were proposed in this pull request?
Upgrade numpy version on lint image and fixed some minor lint failures.
### Why are the changes needed?
When we do `pip install ./dev/requirements.txt` locally, we normally have
the latest version of `numpy`. This creates a diff between our local dev
environment and CI. We should keep this as close as possible so we can rely on
local mypy results.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Locally mypy test passed.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53913 from gaogaotiantian/upgrade-lint-numpy.
Authored-by: Tian Gao <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
---
dev/spark-test-image/lint/Dockerfile | 2 +-
python/pyspark/pandas/frame.py | 2 +-
python/pyspark/pandas/series.py | 6 +++---
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/dev/spark-test-image/lint/Dockerfile
b/dev/spark-test-image/lint/Dockerfile
index c76eb82b32b5..ea0d9ed3eb10 100644
--- a/dev/spark-test-image/lint/Dockerfile
+++ b/dev/spark-test-image/lint/Dockerfile
@@ -91,7 +91,7 @@ RUN python3.11 -m pip install \
'jinja2' \
'matplotlib' \
'mypy==1.8.0' \
- 'numpy==2.0.2' \
+ 'numpy==2.4.1' \
'numpydoc' \
'pandas' \
'pandas-stubs' \
diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index 1d0c0fc638b1..1c66bbec37b7 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -11293,7 +11293,7 @@ defaultdict(<class 'list'>, {'col..., 'col...})]
"""
# Rely on dtype rather than spark type because columns that consist of
bools and
# Nones should be excluded if bool_only is True
- return [label for label in column_labels if
is_bool_dtype(self._psser_for(label))] # type: ignore[arg-type]
+ return [label for label in column_labels if
is_bool_dtype(self._psser_for(label))]
def _result_aggregated(
self, column_labels: List[Label], scols: Sequence[PySparkColumn]
diff --git a/python/pyspark/pandas/series.py b/python/pyspark/pandas/series.py
index 6407749c14fc..9c9ff94f2e16 100644
--- a/python/pyspark/pandas/series.py
+++ b/python/pyspark/pandas/series.py
@@ -1205,10 +1205,10 @@ class Series(Frame, IndexOpsMixin, Generic[T]):
else:
current = current.when(self.spark.column ==
F.lit(to_replace), value)
- if hasattr(arg, "__missing__"):
- tmp_val = arg[np._NoValue] # type: ignore[attr-defined]
+ if isinstance(arg, dict) and hasattr(arg, "__missing__"):
+ tmp_val = arg[np._NoValue]
# Remove in case it's set in defaultdict.
- del arg[np._NoValue] # type: ignore[attr-defined]
+ del arg[np._NoValue]
current = current.otherwise(F.lit(tmp_val))
else:
current =
current.otherwise(F.lit(None).cast(self.spark.data_type))
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]