This is an automated email from the ASF dual-hosted git repository.
sarutak pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.1 by this push:
new 6d135266de98 [SPARK-56765][INFRA] Fix mypy attr-defined errors with
PyArrow 24+
6d135266de98 is described below
commit 6d135266de9893cedad64135c87683cd182d2d89
Author: Kousuke Saruta <[email protected]>
AuthorDate: Sat May 9 02:26:13 2026 +0900
[SPARK-56765][INFRA] Fix mypy attr-defined errors with PyArrow 24+
### What changes were proposed in this pull request?
This PR adds `follow_imports = skip` to the `[mypy-pyarrow.*]` section in
`python/mypy.ini`.
### Why are the changes needed?
PyArrow 24.0.0 (released 2026-04-21) added a `py.typed` marker and a
placeholder `__init__.pyi`, making it a PEP 561 typed package. However,
`pyarrow.compute` has no `.pyi` stub, and its functions (`floor_temporal`,
`assume_timezone`, `local_timestamp`, etc.) are dynamically generated at
runtime via `_make_global_functions()`. As a result, mypy 1.19.1 reports
`attr-defined` errors for these functions:
```
python/pyspark/sql/pandas/types.py:546: error: Module has no attribute
"floor_temporal" [attr-defined]
python/pyspark/sql/pandas/types.py:553: error: Module has no attribute
"assume_timezone" [attr-defined]
python/pyspark/sql/conversion.py:1409: error: Module has no attribute
"local_timestamp" [attr-defined]
python/pyspark/sql/conversion.py:1439: error: Module has no attribute
"local_timestamp" [attr-defined]
```
This issue has already affected the following CIs.
https://github.com/apache/spark/actions/runs/25493241154/job/74815987155
(CI for branch-4.x)
https://github.com/apache/spark/actions/runs/25499265653/job/74833928460
(CI for branch-4.2)
This issue is tracked upstream Arrow as
https://github.com/apache/arrow/issues/48970.
Since the CI Dockerfile specifies `pyarrow>=23.0.0`, PyArrow 24.0.0 will be
installed on the next image rebuild, breaking the mypy lint check.
Note: The master branch CI currently uses a cached Docker image [that still
has PyArrow 23.x installed
](https://github.com/HyukjinKwon/spark/actions/runs/25487258663/job/74786370741#step:19:94)(the
image was last built on 2026-03-16, before PyArrow 24.0.0 was released on
2026-04-21). The same error will surface on master once `FULL_REFRESH_DATE` is
updated and the image is rebuilt.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Ran mypy locally with PyArrow 24.0.0 installed:
```bash
pip install pyarrow==24.0.0
mypy --python-executable python3 --namespace-packages --config-file
python/mypy.ini python/pyspark
```
### Was this patch authored or co-authored using generative AI tooling?
Kiro CLI / Opus 4.6
Closes #55744 from sarutak/fix-mypy-pyarrow24.
Authored-by: Kousuke Saruta <[email protected]>
Signed-off-by: Kousuke Saruta <[email protected]>
(cherry picked from commit e4c6bc75ca24f7afad6020b21e758e10c1d5eca7)
Signed-off-by: Kousuke Saruta <[email protected]>
---
python/mypy.ini | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/python/mypy.ini b/python/mypy.ini
index 4dd626cd5077..42d0278c69f7 100644
--- a/python/mypy.ini
+++ b/python/mypy.ini
@@ -158,6 +158,12 @@ ignore_missing_imports = True
[mypy-pyarrow.*]
ignore_missing_imports = True
+; TODO(ARROW-48970): Remove follow_imports once PyArrow ships complete type
stubs
+; for pyarrow.compute. Currently its functions are dynamically generated and
+; invisible to mypy since PyArrow 24 added py.typed.
+[mypy-pyarrow.compute]
+follow_imports = skip
+
[mypy-psutil.*]
ignore_missing_imports = True
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]