(spark) branch master updated: [SPARK-53846][PYTHON][TESTS] Skip `test_profile_pandas_*` tests if pandas or pyarrow are unavailable

dongjoon Wed, 08 Oct 2025 18:38:27 -0700

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 18f0463a97bd [SPARK-53846][PYTHON][TESTS] Skip `test_profile_pandas_*` 
tests if pandas or pyarrow are unavailable
18f0463a97bd is described below

commit 18f0463a97bde206595a0a0d3a8c2b6d37d38975
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Oct 8 18:37:26 2025 -0700

    [SPARK-53846][PYTHON][TESTS] Skip `test_profile_pandas_*` tests if pandas 
or pyarrow are unavailable
    
    ### What changes were proposed in this pull request?
    
    This PR aims to skip `test_profile_pandas_udf` and 
`test_profile_pandas_function_api` tests if `pandas` or `pyarrow` are 
unavailable like the other test cases, e.g., `test_memory_profiler_pandas_udf`.
    
    ```
    $ git grep test_profile_pandas
    python/pyspark/tests/test_memory_profiler.py:    def 
test_profile_pandas_udf(self):
    python/pyspark/tests/test_memory_profiler.py:    def 
test_profile_pandas_function_api(self):
    ```
    
    ### Why are the changes needed?
    
    We had better check the test requirements explicitly. In other words, 
PySpark unit tests should pass without those packages like the existing other 
unit test cases.
    
    
https://github.com/apache/spark/blob/bf2457b6db77b911874a22e6d73f07793f44bef1/python/pyspark/tests/test_memory_profiler.py#L307-L311
    
    ### Does this PR introduce _any_ user-facing change?
    
    No. This is a test change.
    
    ### How was this patch tested?
    
    Pass the CIs and manually test without `pyarrow`.
    
    ```
    ...
    Tests passed in 159 seconds
    
    Skipped tests in pyspark.tests.test_memory_profiler with python3:
          test_memory_profiler_aggregate_in_pandas 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_aggregate_in_pandas)
 ... skip (0.000s)
          test_memory_profiler_cogroup_apply_in_arrow 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_arrow)
 ... skip (0.001s)
          test_memory_profiler_cogroup_apply_in_pandas 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_cogroup_apply_in_pandas)
 ... skip (0.000s)
          test_memory_profiler_group_apply_in_arrow 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_arrow)
 ... skip (0.000s)
          test_memory_profiler_group_apply_in_pandas 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_group_apply_in_pandas)
 ... skip (0.000s)
          test_memory_profiler_map_in_pandas_not_supported 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_map_in_pandas_not_supported)
 ... skip (0.000s)
          test_memory_profiler_pandas_udf 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf)
 ... skip (0.000s)
          test_memory_profiler_pandas_udf_iterator_not_supported 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_iterator_not_supported)
 ... skip (0.000s)
          test_memory_profiler_pandas_udf_window 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_pandas_udf_window)
 ... skip (0.000s)
          test_memory_profiler_udf_with_arrow 
(pyspark.tests.test_memory_profiler.MemoryProfiler2Tests.test_memory_profiler_udf_with_arrow)
 ... skip (0.000s)
          test_profile_pandas_function_api 
(pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_function_api)
 ... skip (0.000s)
          test_profile_pandas_udf 
(pyspark.tests.test_memory_profiler.MemoryProfilerTests.test_profile_pandas_udf)
 ... skip (0.000s)
    ...
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #52549 from dongjoon-hyun/SPARK-53846.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 python/pyspark/tests/test_memory_profiler.py | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/python/pyspark/tests/test_memory_profiler.py 
b/python/pyspark/tests/test_memory_profiler.py
index ca75e4fa8976..144442b5a48f 100644
--- a/python/pyspark/tests/test_memory_profiler.py
+++ b/python/pyspark/tests/test_memory_profiler.py
@@ -112,6 +112,10 @@ class MemoryProfilerTests(PySparkTestCase):
             self.sc.dump_profiles(d)
             self.assertTrue(f"udf_{id}_memory.txt" in os.listdir(d))
 
+    @unittest.skipIf(
+        not have_pandas or not have_pyarrow,
+        cast(str, pandas_requirement_message or pyarrow_requirement_message),
+    )
     def test_profile_pandas_udf(self):
         udfs = [self.exec_pandas_udf_ser_to_ser, 
self.exec_pandas_udf_ser_to_scalar]
         udf_names = ["ser_to_ser", "ser_to_scalar"]
@@ -130,6 +134,10 @@ class MemoryProfilerTests(PySparkTestCase):
                 "Profiling UDFs with iterators input/output is not supported" 
in str(user_warns[0])
             )
 
+    @unittest.skipIf(
+        not have_pandas or not have_pyarrow,
+        cast(str, pandas_requirement_message or pyarrow_requirement_message),
+    )
     def test_profile_pandas_function_api(self):
         apis = [self.exec_grouped_map]
         f_names = ["grouped_map"]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-53846][PYTHON][TESTS] Skip `test_profile_pandas_*` tests if pandas or pyarrow are unavailable

Reply via email to