This is an automated email from the ASF dual-hosted git repository.

sarutak pushed a commit to branch branch-4.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.1 by this push:
     new 6d135266de98 [SPARK-56765][INFRA] Fix mypy attr-defined errors with 
PyArrow 24+
6d135266de98 is described below

commit 6d135266de9893cedad64135c87683cd182d2d89
Author: Kousuke Saruta <[email protected]>
AuthorDate: Sat May 9 02:26:13 2026 +0900

    [SPARK-56765][INFRA] Fix mypy attr-defined errors with PyArrow 24+
    
    ### What changes were proposed in this pull request?
    This PR adds `follow_imports = skip` to the `[mypy-pyarrow.*]` section in 
`python/mypy.ini`.
    
    ### Why are the changes needed?
    PyArrow 24.0.0 (released 2026-04-21) added a `py.typed` marker and a 
placeholder `__init__.pyi`, making it a PEP 561 typed package. However, 
`pyarrow.compute` has no `.pyi` stub, and its functions (`floor_temporal`, 
`assume_timezone`, `local_timestamp`, etc.) are dynamically generated at 
runtime via `_make_global_functions()`. As a result, mypy 1.19.1 reports 
`attr-defined` errors for these functions:
    
    ```
    python/pyspark/sql/pandas/types.py:546: error: Module has no attribute 
"floor_temporal" [attr-defined]
    python/pyspark/sql/pandas/types.py:553: error: Module has no attribute 
"assume_timezone" [attr-defined]
    python/pyspark/sql/conversion.py:1409: error: Module has no attribute 
"local_timestamp" [attr-defined]
    python/pyspark/sql/conversion.py:1439: error: Module has no attribute 
"local_timestamp" [attr-defined]
    ```
    
    This issue has already affected the following CIs.
    https://github.com/apache/spark/actions/runs/25493241154/job/74815987155 
(CI for branch-4.x)
    https://github.com/apache/spark/actions/runs/25499265653/job/74833928460 
(CI for branch-4.2)
    
    This issue is tracked upstream Arrow as 
https://github.com/apache/arrow/issues/48970.
    
    Since the CI Dockerfile specifies `pyarrow>=23.0.0`, PyArrow 24.0.0 will be 
installed on the next image rebuild, breaking the mypy lint check.
    
    Note: The master branch CI currently uses a cached Docker image [that still 
has PyArrow 23.x installed 
](https://github.com/HyukjinKwon/spark/actions/runs/25487258663/job/74786370741#step:19:94)(the
 image was last built on 2026-03-16, before PyArrow 24.0.0 was released on 
2026-04-21). The same error will surface on master once `FULL_REFRESH_DATE` is 
updated and the image is rebuilt.
    
    ### Does this PR introduce _any_ user-facing change?
    No.
    
    ### How was this patch tested?
    Ran mypy locally with PyArrow 24.0.0 installed:
    
    ```bash
    pip install pyarrow==24.0.0
    mypy --python-executable python3 --namespace-packages --config-file 
python/mypy.ini python/pyspark
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    Kiro CLI / Opus 4.6
    
    Closes #55744 from sarutak/fix-mypy-pyarrow24.
    
    Authored-by: Kousuke Saruta <[email protected]>
    Signed-off-by: Kousuke Saruta <[email protected]>
    (cherry picked from commit e4c6bc75ca24f7afad6020b21e758e10c1d5eca7)
    Signed-off-by: Kousuke Saruta <[email protected]>
---
 python/mypy.ini | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/python/mypy.ini b/python/mypy.ini
index 4dd626cd5077..42d0278c69f7 100644
--- a/python/mypy.ini
+++ b/python/mypy.ini
@@ -158,6 +158,12 @@ ignore_missing_imports = True
 [mypy-pyarrow.*]
 ignore_missing_imports = True
 
+; TODO(ARROW-48970): Remove follow_imports once PyArrow ships complete type 
stubs
+; for pyarrow.compute. Currently its functions are dynamically generated and
+; invisible to mypy since PyArrow 24 added py.typed.
+[mypy-pyarrow.compute]
+follow_imports = skip
+
 [mypy-psutil.*]
 ignore_missing_imports = True
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to