Irakli Machabeli created SPARK-12467:
----------------------------------------
Summary: Get rid of sorting in Row's constructor in pyspark
Key: SPARK-12467
URL: https://issues.apache.org/jira/browse/SPARK-12467
Project: Spark
Issue Type: Bug
Components: PySpark, SQL
Affects Versions: 1.5.2
Reporter: Irakli Machabeli
Priority: Minor
Current implementation of Row's __new__ sorts columns by name
First of all there is no obvious reason to sort, second, if one converts
dataframe to rdd and than back to dataframe, order of column changes. While
this is not a bug, nevetheless it makes looking at the data really
inconvenient.
def __new__(self, *args, **kwargs):
if args and kwargs:
raise ValueError("Can not use both args "
"and kwargs to create Row")
if args:
# create row class or objects
return tuple.__new__(self, args)
elif kwargs:
# create row objects
names = sorted(kwargs.keys()) # just get rid of sorting here!!!
row = tuple.__new__(self, [kwargs[n] for n in names])
row.__fields__ = names
return row
else:
raise ValueError("No args or kwargs")
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]