(spark) branch master updated: [SPARK-54369][CONNECT][TESTS] Fix `PythonPipelineSuite` flakiness via `Set` instead of `Seq`

dongjoon Sun, 16 Nov 2025 07:56:24 -0800

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new 13545a4e5078 [SPARK-54369][CONNECT][TESTS] Fix `PythonPipelineSuite` 
flakiness via `Set` instead of `Seq`
13545a4e5078 is described below

commit 13545a4e507827c519faebaf0c64ad774eec22d3
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sun Nov 16 07:56:03 2025 -0800

    [SPARK-54369][CONNECT][TESTS] Fix `PythonPipelineSuite` flakiness via `Set` 
instead of `Seq`
    
    ### What changes were proposed in this pull request?
    
    This PR aims to fix `PythonPipelineSuite` flakiness via `Set` instead of 
`Seq` in multiple places.
    
    ### Why are the changes needed?
    
    Currently, `PythonPipelineSuite` is flaky like the following. We should fix 
this flakiness.
    - https://github.com/apache/spark/actions/runs/19396864076/job/55498096472
    ```
    [info] - referencing internal datasets *** FAILED *** (821 milliseconds)
    [info]   List(`spark_catalog`.`default`.`src`, 
`spark_catalog`.`default`.`c`, `spark_catalog`.`default`.`a`) did not equal 
List(`spark_catalog`.`default`.`src`, `spark_catalog`.`default`.`a`, 
`spark_catalog`.`default`.`c`) (PythonPipelineSuite.scala:366)
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the CIs.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #53080 from dongjoon-hyun/SPARK-XXX.
    
    Authored-by: Dongjoon Hyun <[email protected]>
    Signed-off-by: Dongjoon Hyun <[email protected]>
---
 .../apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git 
a/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
 
b/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
index 1850241f0702..45d8c7b18b84 100644
--- 
a/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
+++ 
b/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
@@ -364,11 +364,13 @@ class PythonPipelineSuite
 
     val (streamingFlows, batchFlows) = 
graph.resolvedFlows.partition(_.df.isStreaming)
     assert(
-      batchFlows.map(_.identifier) == Seq(
+      batchFlows.map(_.identifier).toSet == Set(
         graphIdentifier("src"),
         graphIdentifier("a"),
         graphIdentifier("c")))
-    assert(streamingFlows.map(_.identifier) == Seq(graphIdentifier("b"), 
graphIdentifier("d")))
+    assert(
+      streamingFlows.map(_.identifier).toSet ==
+        Set(graphIdentifier("b"), graphIdentifier("d")))
   }
 
   test("referencing external datasets") {
@@ -722,7 +724,8 @@ class PythonPipelineSuite
     assert(
       graph
         .flowsTo(graphIdentifier("a"))
-        .map(_.identifier) == Seq(graphIdentifier("a"), 
graphIdentifier("something")))
+        .map(_.identifier)
+        .toSet == Set(graphIdentifier("a"), graphIdentifier("something")))
   }
 
   test("groupby and rollup works with internal datasets, referencing with 
(col, str)") {


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [SPARK-54369][CONNECT][TESTS] Fix `PythonPipelineSuite` flakiness via `Set` instead of `Seq`

Reply via email to