Repository: spark
Updated Branches:
  refs/heads/branch-2.4 a9a8d3a4b -> 99ae693b3


[SPARK-25471][PYTHON][TEST] Fix pyspark-sql test error when using Python 3.6 
and Pandas 0.23

## What changes were proposed in this pull request?

Fix test that constructs a Pandas DataFrame by specifying the column order. 
Previously this test assumed the columns would be sorted alphabetically, 
however when using Python 3.6 with Pandas 0.23 or higher, the original column 
order is maintained. This causes the columns to get mixed up and the test 
errors.

Manually tested with `python/run-tests` using Python 3.6.6 and Pandas 0.23.4

Closes #22477 from BryanCutler/pyspark-tests-py36-pd23-SPARK-25471.

Authored-by: Bryan Cutler <[email protected]>
Signed-off-by: hyukjinkwon <[email protected]>
(cherry picked from commit 90e3955f384ca07bdf24faa6cdb60ded944cf0d8)
Signed-off-by: hyukjinkwon <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/99ae693b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/99ae693b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/99ae693b

Branch: refs/heads/branch-2.4
Commit: 99ae693b3722db6e01825b8cf2c3f2ef74a65ddb
Parents: a9a8d3a
Author: Bryan Cutler <[email protected]>
Authored: Thu Sep 20 09:29:29 2018 +0800
Committer: hyukjinkwon <[email protected]>
Committed: Thu Sep 20 09:29:49 2018 +0800

----------------------------------------------------------------------
 python/pyspark/sql/tests.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/99ae693b/python/pyspark/sql/tests.py
----------------------------------------------------------------------
diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py
index 08d7cfa..603f994 100644
--- a/python/pyspark/sql/tests.py
+++ b/python/pyspark/sql/tests.py
@@ -3266,7 +3266,7 @@ class SQLTests(ReusedSQLTestCase):
         import pandas as pd
         from datetime import datetime
         pdf = pd.DataFrame({"ts": [datetime(2017, 10, 31, 1, 1, 1)],
-                            "d": [pd.Timestamp.now().date()]})
+                            "d": [pd.Timestamp.now().date()]}, columns=["d", 
"ts"])
         # test types are inferred correctly without specifying schema
         df = self.spark.createDataFrame(pdf)
         self.assertTrue(isinstance(df.schema['ts'].dataType, TimestampType))


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to