This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 4574ec360b1 [SPARK-43349][PS][TEST] Fix flaky test for `DataFrame` 
creation
4574ec360b1 is described below

commit 4574ec360b15af476e7e5bac3b0fc62756ff01c3
Author: itholic <[email protected]>
AuthorDate: Wed May 3 19:41:05 2023 +0900

    [SPARK-43349][PS][TEST] Fix flaky test for `DataFrame` creation
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to fix DataFrame creating test since it's flaky failing 
within some envs as below:
    ```
    DataFrame.index values are different (100.0 %)
    [left]:  Index(['Databricks', 'Hello', 'Universe'], dtype='object')
    [right]: Index(['Hello', 'Universe', 'Databricks'], dtype='object')
    
    Left:
                     x
    Databricks  2004.0
    Hello       2002.0
    Universe       NaN
    x    float64
    dtype: object
    
    Right:
                     x
    Hello       2002.0
    Universe       NaN
    Databricks  2004.0
    x    float64
    dtype: object
    ```
    
    ### Why are the changes needed?
    
    Fix flaky test
    
    ### Does this PR introduce _any_ user-facing change?
    
    No, test-only
    
    ### How was this patch tested?
    
    Manually tested.
    
    Closes #41025 from itholic/fix_pandas_test.
    
    Authored-by: itholic <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 python/pyspark/pandas/tests/test_dataframe.py | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/python/pyspark/pandas/tests/test_dataframe.py 
b/python/pyspark/pandas/tests/test_dataframe.py
index f06e5e125ed..eb77f0710ed 100644
--- a/python/pyspark/pandas/tests/test_dataframe.py
+++ b/python/pyspark/pandas/tests/test_dataframe.py
@@ -238,14 +238,22 @@ class DataFrameTestsMixin:
         with ps.option_context("compute.ops_on_diff_frames", True):
             # test with ps.DataFrame and pd.Index
             self.assert_eq(
-                ps.DataFrame(data=psdf, index=pd.Index(["Hello", "Universe", 
"Databricks"])),
-                pd.DataFrame(data=pdf, index=pd.Index(["Hello", "Universe", 
"Databricks"])),
+                ps.DataFrame(
+                    data=psdf, index=pd.Index(["Hello", "Universe", 
"Databricks"])
+                ).sort_index(),
+                pd.DataFrame(
+                    data=pdf, index=pd.Index(["Hello", "Universe", 
"Databricks"])
+                ).sort_index(),
             )
 
             # test with ps.DataFrame and ps.Index
             self.assert_eq(
-                ps.DataFrame(data=psdf, index=ps.Index(["Hello", "Universe", 
"Databricks"])),
-                pd.DataFrame(data=pdf, index=pd.Index(["Hello", "Universe", 
"Databricks"])),
+                ps.DataFrame(
+                    data=psdf, index=ps.Index(["Hello", "Universe", 
"Databricks"])
+                ).sort_index(),
+                pd.DataFrame(
+                    data=pdf, index=pd.Index(["Hello", "Universe", 
"Databricks"])
+                ).sort_index(),
             )
 
         # test DatetimeIndex


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to