This is an automated email from the ASF dual-hosted git repository.
gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 4574ec360b1 [SPARK-43349][PS][TEST] Fix flaky test for `DataFrame`
creation
4574ec360b1 is described below
commit 4574ec360b15af476e7e5bac3b0fc62756ff01c3
Author: itholic <[email protected]>
AuthorDate: Wed May 3 19:41:05 2023 +0900
[SPARK-43349][PS][TEST] Fix flaky test for `DataFrame` creation
### What changes were proposed in this pull request?
This PR proposes to fix DataFrame creating test since it's flaky failing
within some envs as below:
```
DataFrame.index values are different (100.0 %)
[left]: Index(['Databricks', 'Hello', 'Universe'], dtype='object')
[right]: Index(['Hello', 'Universe', 'Databricks'], dtype='object')
Left:
x
Databricks 2004.0
Hello 2002.0
Universe NaN
x float64
dtype: object
Right:
x
Hello 2002.0
Universe NaN
Databricks 2004.0
x float64
dtype: object
```
### Why are the changes needed?
Fix flaky test
### Does this PR introduce _any_ user-facing change?
No, test-only
### How was this patch tested?
Manually tested.
Closes #41025 from itholic/fix_pandas_test.
Authored-by: itholic <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
---
python/pyspark/pandas/tests/test_dataframe.py | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/python/pyspark/pandas/tests/test_dataframe.py
b/python/pyspark/pandas/tests/test_dataframe.py
index f06e5e125ed..eb77f0710ed 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -238,14 +238,22 @@ class DataFrameTestsMixin:
with ps.option_context("compute.ops_on_diff_frames", True):
# test with ps.DataFrame and pd.Index
self.assert_eq(
- ps.DataFrame(data=psdf, index=pd.Index(["Hello", "Universe",
"Databricks"])),
- pd.DataFrame(data=pdf, index=pd.Index(["Hello", "Universe",
"Databricks"])),
+ ps.DataFrame(
+ data=psdf, index=pd.Index(["Hello", "Universe",
"Databricks"])
+ ).sort_index(),
+ pd.DataFrame(
+ data=pdf, index=pd.Index(["Hello", "Universe",
"Databricks"])
+ ).sort_index(),
)
# test with ps.DataFrame and ps.Index
self.assert_eq(
- ps.DataFrame(data=psdf, index=ps.Index(["Hello", "Universe",
"Databricks"])),
- pd.DataFrame(data=pdf, index=pd.Index(["Hello", "Universe",
"Databricks"])),
+ ps.DataFrame(
+ data=psdf, index=ps.Index(["Hello", "Universe",
"Databricks"])
+ ).sort_index(),
+ pd.DataFrame(
+ data=pdf, index=pd.Index(["Hello", "Universe",
"Databricks"])
+ ).sort_index(),
)
# test DatetimeIndex
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]