Szymon Matejczyk created SPARK-13802:
----------------------------------------
Summary: Fields order in Row is not consistent with
Schema.toInternal method
Key: SPARK-13802
URL: https://issues.apache.org/jira/browse/SPARK-13802
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 1.6.0
Reporter: Szymon Matejczyk
When using Row constructor from kwargs, fields in the tuple underneath are
sorted by name. When Schema is reading the row, it is not using the fields in
this order.
{code:python}
from pyspark.sql import Row
from pyspark.sql.types import *
schema = StructType([
StructField("id", StringType()),
StructField("first_name", StringType())])
row = Row(id="39", first_name="Szymon")
schema.toInternal(row)
Out[5]: ('Szymon', '39')
{code}
{code:python}
df = sqlContext.createDataFrame([row], schema)
df.show(1)
+----------+----------+
| id |first_name|
+----------+----------+
|Szymon| 39|
+----------+----------+
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]