(spark) branch master updated: [SPARK-52764][ML][CONNECT][TESTS] Restore `test_parity_classification` and `test_parity_regression`

ruifengz Wed, 17 Dec 2025 16:32:46 -0800

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new b634d3cd7e45 [SPARK-52764][ML][CONNECT][TESTS] Restore 
`test_parity_classification` and `test_parity_regression`
b634d3cd7e45 is described below

commit b634d3cd7e4534ceeb6910feca1889471f24e2ea
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Thu Dec 18 08:32:27 2025 +0800

    [SPARK-52764][ML][CONNECT][TESTS] Restore `test_parity_classification` and 
`test_parity_regression`
    
    ### What changes were proposed in this pull request?
    Restore `test_parity_classification` and `test_parity_regression`
    
    the changes for model summary offloading in 
https://github.com/apache/spark/commit/fd74b5ec6bd662ca51ea6bec06c4432503e566d4#diff-50ad4e7f2b34a05cb38c56cd47491239b09002d1f95e59b6a1d0b257e9ec5b8e
 seems problematical, and cause dead lock in the python side.
    
    ```py
      test_multinomial_logistic_regression_with_bound 
(pyspark.ml.tests.connect.test_parity_classification.ClassificationParityTests.test_multinomial_logistic_regression_with_bound)
 ... Exception ignored in: <function JavaWrapper.__del__ at 0x108b5e020>
    Traceback (most recent call last):
      File "/Users/ruifeng.zheng/spark/python/pyspark/ml/util.py", line 379, in 
wrapped
        self._remote_model_obj.release_ref()
      File "/Users/ruifeng.zheng/spark/python/pyspark/ml/util.py", line 162, in 
release_ref
        del_remote_cache(self.ref_id)
      File "/Users/ruifeng.zheng/spark/python/pyspark/ml/util.py", line 358, in 
del_remote_cache
        session.client._delete_ml_cache([ref_id])
      File 
"/Users/ruifeng.zheng/spark/python/pyspark/sql/connect/client/core.py", line 
2133, in _delete_ml_cache
        (_, properties, _) = self.execute_command(command)
      File 
"/Users/ruifeng.zheng/spark/python/pyspark/sql/connect/client/core.py", line 
1158, in execute_command
        data, _, metrics, observed_metrics, properties = 
self._execute_and_fetch(
      File 
"/Users/ruifeng.zheng/spark/python/pyspark/sql/connect/client/core.py", line 
1660, in _execute_and_fetch
        for response in self._execute_and_fetch_as_iterator(
      File 
"/Users/ruifeng.zheng/spark/python/pyspark/sql/connect/client/core.py", line 
1635, in _execute_and_fetch_as_iterator
        raise kb
      File 
"/Users/ruifeng.zheng/spark/python/pyspark/sql/connect/client/core.py", line 
1617, in _execute_and_fetch_as_iterator
        generator = ExecutePlanResponseReattachableIterator(
      File 
"/Users/ruifeng.zheng/spark/python/pyspark/sql/connect/client/reattach.py", 
line 127, in __init__
        self._stub.ExecutePlan(self._initial_request, metadata=metadata)
      File 
"/Users/ruifeng.zheng/.dev/miniconda3/envs/spark_dev_313/lib/python3.13/site-packages/grpc/_channel.py",
 line 1396, in __call__
        call = self._managed_call(
      File 
"/Users/ruifeng.zheng/.dev/miniconda3/envs/spark_dev_313/lib/python3.13/site-packages/grpc/_channel.py",
 line 1784, in create
        with state.lock:
      File "/Users/ruifeng.zheng/spark/python/pyspark/core/context.py", line 
409, in signal_handler
        raise KeyboardInterrupt()
    KeyboardInterrupt:
    ```
    
    I plan to add dedicated test for model summary offloading in separate PR, 
this PR is to restore the basic coverage
    
    ### Why are the changes needed?
    for test coverage
    
    ### Does this PR introduce _any_ user-facing change?
    no,test-only
    
    ### How was this patch tested?
    Manually run the test in my local, the hanging issue doesn't occur in 
successive 10 runs
    
    ### Was this patch authored or co-authored using generative AI tooling?
    no
    
    Closes #53504 from zhengruifeng/restore_cla.
    
    Authored-by: Ruifeng Zheng <[email protected]>
    Signed-off-by: Ruifeng Zheng <[email protected]>
---
 .../ml/tests/connect/test_parity_classification.py |  2 --
 .../ml/tests/connect/test_parity_regression.py     |  2 --
 python/pyspark/ml/tests/test_classification.py     | 22 ----------------------
 python/pyspark/ml/tests/test_regression.py         |  7 -------
 4 files changed, 33 deletions(-)

diff --git a/python/pyspark/ml/tests/connect/test_parity_classification.py 
b/python/pyspark/ml/tests/connect/test_parity_classification.py
index 7805546dba70..3c7e8ff71a2d 100644
--- a/python/pyspark/ml/tests/connect/test_parity_classification.py
+++ b/python/pyspark/ml/tests/connect/test_parity_classification.py
@@ -21,8 +21,6 @@ from pyspark.ml.tests.test_classification import 
ClassificationTestsMixin
 from pyspark.testing.connectutils import ReusedConnectTestCase
 
 
-# TODO(SPARK-52764): Re-enable this test after fixing the flakiness.
[email protected]("Disabled due to flakiness, should be enabled after fixing the 
issue")
 class ClassificationParityTests(ClassificationTestsMixin, 
ReusedConnectTestCase):
     pass
 
diff --git a/python/pyspark/ml/tests/connect/test_parity_regression.py 
b/python/pyspark/ml/tests/connect/test_parity_regression.py
index 407280827076..7c2743a938fa 100644
--- a/python/pyspark/ml/tests/connect/test_parity_regression.py
+++ b/python/pyspark/ml/tests/connect/test_parity_regression.py
@@ -21,8 +21,6 @@ from pyspark.ml.tests.test_regression import 
RegressionTestsMixin
 from pyspark.testing.connectutils import ReusedConnectTestCase
 
 
-# TODO(SPARK-52764): Re-enable this test after fixing the flakiness.
[email protected]("Disabled due to flakiness, should be enabled after fixing the 
issue")
 class RegressionParityTests(RegressionTestsMixin, ReusedConnectTestCase):
     pass
 
diff --git a/python/pyspark/ml/tests/test_classification.py 
b/python/pyspark/ml/tests/test_classification.py
index 21bce70e8735..9d27c5b51a8a 100644
--- a/python/pyspark/ml/tests/test_classification.py
+++ b/python/pyspark/ml/tests/test_classification.py
@@ -55,7 +55,6 @@ from pyspark.ml.classification import (
     MultilayerPerceptronClassificationTrainingSummary,
 )
 from pyspark.ml.regression import DecisionTreeRegressionModel
-from pyspark.sql import is_remote
 from pyspark.testing.sqlutils import ReusedSQLTestCase
 
 
@@ -276,9 +275,6 @@ class ClassificationTestsMixin:
             self.assertAlmostEqual(s.weightedFMeasure(1.0), 1.0, 2)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         s = model.summary
         # test evaluation (with training dataset) produces a summary with same 
values
@@ -329,9 +325,6 @@ class ClassificationTestsMixin:
             self.assertAlmostEqual(s.weightedFMeasure(1.0), 0.65, 2)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         s = model.summary
         # test evaluation (with training dataset) produces a summary with same 
values
@@ -455,9 +448,6 @@ class ClassificationTestsMixin:
             self.assertEqual(summary.predictions.columns, expected_cols)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary2 = model.evaluate(df)
         self.assertIsInstance(summary2, LinearSVCSummary)
@@ -560,9 +550,6 @@ class ClassificationTestsMixin:
             self.assertEqual(summary.predictions.columns, expected_cols)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary2 = model.evaluate(df)
         self.assertIsInstance(summary2, FMClassificationSummary)
@@ -814,9 +801,6 @@ class ClassificationTestsMixin:
             self.assertEqual(summary.predictions.columns, expected_cols)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary2 = model.evaluate(df)
         self.assertTrue(isinstance(summary2, 
BinaryRandomForestClassificationSummary))
@@ -905,9 +889,6 @@ class ClassificationTestsMixin:
             self.assertEqual(summary.predictions.columns, expected_cols)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary2 = model.evaluate(df)
         self.assertTrue(isinstance(summary2, 
RandomForestClassificationSummary))
@@ -1006,9 +987,6 @@ class ClassificationTestsMixin:
             self.assertEqual(summary.predictions.columns, expected_cols)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary2 = model.evaluate(df)
         self.assertIsInstance(summary2, 
MultilayerPerceptronClassificationSummary)
diff --git a/python/pyspark/ml/tests/test_regression.py 
b/python/pyspark/ml/tests/test_regression.py
index 52688fdd63cf..a45a72f5f171 100644
--- a/python/pyspark/ml/tests/test_regression.py
+++ b/python/pyspark/ml/tests/test_regression.py
@@ -43,7 +43,6 @@ from pyspark.ml.regression import (
     GBTRegressor,
     GBTRegressionModel,
 )
-from pyspark.sql import is_remote
 from pyspark.testing.sqlutils import ReusedSQLTestCase
 
 
@@ -243,9 +242,6 @@ class RegressionTestsMixin:
             self.assertTrue(np.allclose(summary.r2adj, -0.6718362282878414, 
atol=1e-4))
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary2 = model.evaluate(df)
         self.assertTrue(isinstance(summary2, LinearRegressionSummary))
@@ -359,9 +355,6 @@ class RegressionTestsMixin:
             self.assertEqual(summary.residuals().count(), 4)
 
         check_summary()
-        if is_remote():
-            self.spark.client._delete_ml_cache([model._java_obj._ref_id], 
evict_only=True)
-            check_summary()
 
         summary = model.summary
         summary2 = model.evaluate(df)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-52764][ML][CONNECT][TESTS] Restore `test_parity_classification` and `test_parity_regression`

Reply via email to