Repository: spark Updated Branches: refs/heads/master 8fb1d1c7f -> 3d1e67f90
[SPARK-15342] [SQL] [PYSPARK] PySpark test for non ascii column name does not actually test with unicode column name ## What changes were proposed in this pull request? The PySpark SQL `test_column_name_with_non_ascii` wants to test non-ascii column name. But it doesn't actually test it. We need to construct an unicode explicitly using `unicode` under Python 2. ## How was this patch tested? Existing tests. Author: Liang-Chi Hsieh <[email protected]> Closes #13134 from viirya/correct-non-ascii-colname-pytest. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d1e67f9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d1e67f9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d1e67f9 Branch: refs/heads/master Commit: 3d1e67f903ab3512fcad82b94b1825578f8117c9 Parents: 8fb1d1c Author: Liang-Chi Hsieh <[email protected]> Authored: Wed May 18 11:18:33 2016 -0700 Committer: Davies Liu <[email protected]> Committed: Wed May 18 11:18:33 2016 -0700 ---------------------------------------------------------------------- python/pyspark/sql/tests.py | 11 +++++++++-- python/pyspark/sql/types.py | 3 ++- 2 files changed, 11 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/3d1e67f9/python/pyspark/sql/tests.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/tests.py b/python/pyspark/sql/tests.py index e86f442..1790432 100644 --- a/python/pyspark/sql/tests.py +++ b/python/pyspark/sql/tests.py @@ -1044,8 +1044,15 @@ class SQLTests(ReusedPySparkTestCase): self.assertRaises(TypeError, lambda: df[{}]) def test_column_name_with_non_ascii(self): - df = self.spark.createDataFrame([(1,)], ["æ°é"]) - self.assertEqual(StructType([StructField("æ°é", LongType(), True)]), df.schema) + if sys.version >= '3': + columnName = "æ°é" + self.assertTrue(isinstance(columnName, str)) + else: + columnName = unicode("æ°é", "utf-8") + self.assertTrue(isinstance(columnName, unicode)) + schema = StructType([StructField(columnName, LongType(), True)]) + df = self.spark.createDataFrame([(1,)], schema) + self.assertEqual(schema, df.schema) self.assertEqual("DataFrame[æ°é: bigint]", str(df)) self.assertEqual([("æ°é", 'bigint')], df.dtypes) self.assertEqual(1, df.select("æ°é").first()[0]) http://git-wip-us.apache.org/repos/asf/spark/blob/3d1e67f9/python/pyspark/sql/types.py ---------------------------------------------------------------------- diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py index 30ab130..7d8d023 100644 --- a/python/pyspark/sql/types.py +++ b/python/pyspark/sql/types.py @@ -27,7 +27,7 @@ from array import array if sys.version >= "3": long = int - unicode = str + basestring = unicode = str from py4j.protocol import register_input_converter from py4j.java_gateway import JavaClass @@ -401,6 +401,7 @@ class StructField(DataType): False """ assert isinstance(dataType, DataType), "dataType should be DataType" + assert isinstance(name, basestring), "field name should be string" if not isinstance(name, str): name = name.encode('utf-8') self.name = name --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
