This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 13545a4e5078 [SPARK-54369][CONNECT][TESTS] Fix `PythonPipelineSuite`
flakiness via `Set` instead of `Seq`
13545a4e5078 is described below
commit 13545a4e507827c519faebaf0c64ad774eec22d3
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Sun Nov 16 07:56:03 2025 -0800
[SPARK-54369][CONNECT][TESTS] Fix `PythonPipelineSuite` flakiness via `Set`
instead of `Seq`
### What changes were proposed in this pull request?
This PR aims to fix `PythonPipelineSuite` flakiness via `Set` instead of
`Seq` in multiple places.
### Why are the changes needed?
Currently, `PythonPipelineSuite` is flaky like the following. We should fix
this flakiness.
- https://github.com/apache/spark/actions/runs/19396864076/job/55498096472
```
[info] - referencing internal datasets *** FAILED *** (821 milliseconds)
[info] List(`spark_catalog`.`default`.`src`,
`spark_catalog`.`default`.`c`, `spark_catalog`.`default`.`a`) did not equal
List(`spark_catalog`.`default`.`src`, `spark_catalog`.`default`.`a`,
`spark_catalog`.`default`.`c`) (PythonPipelineSuite.scala:366)
```
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Pass the CIs.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #53080 from dongjoon-hyun/SPARK-XXX.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git
a/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
b/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
index 1850241f0702..45d8c7b18b84 100644
---
a/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
+++
b/sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/PythonPipelineSuite.scala
@@ -364,11 +364,13 @@ class PythonPipelineSuite
val (streamingFlows, batchFlows) =
graph.resolvedFlows.partition(_.df.isStreaming)
assert(
- batchFlows.map(_.identifier) == Seq(
+ batchFlows.map(_.identifier).toSet == Set(
graphIdentifier("src"),
graphIdentifier("a"),
graphIdentifier("c")))
- assert(streamingFlows.map(_.identifier) == Seq(graphIdentifier("b"),
graphIdentifier("d")))
+ assert(
+ streamingFlows.map(_.identifier).toSet ==
+ Set(graphIdentifier("b"), graphIdentifier("d")))
}
test("referencing external datasets") {
@@ -722,7 +724,8 @@ class PythonPipelineSuite
assert(
graph
.flowsTo(graphIdentifier("a"))
- .map(_.identifier) == Seq(graphIdentifier("a"),
graphIdentifier("something")))
+ .map(_.identifier)
+ .toSet == Set(graphIdentifier("a"), graphIdentifier("something")))
}
test("groupby and rollup works with internal datasets, referencing with
(col, str)") {
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]