Irakli Machabeli created SPARK-12467:
----------------------------------------

             Summary: Get rid of sorting in Row's constructor in pyspark
                 Key: SPARK-12467
                 URL: https://issues.apache.org/jira/browse/SPARK-12467
             Project: Spark
          Issue Type: Bug
          Components: PySpark, SQL
    Affects Versions: 1.5.2
            Reporter: Irakli Machabeli
            Priority: Minor


Current implementation of Row's __new__ sorts columns by name
First of all there is no obvious reason to sort, second, if one converts 
dataframe to rdd and than back to dataframe, order of column changes. While 
this is not  a bug, nevetheless it makes looking at the data really 
inconvenient.



    def __new__(self, *args, **kwargs):
        if args and kwargs:
            raise ValueError("Can not use both args "
                             "and kwargs to create Row")
        if args:
            # create row class or objects
            return tuple.__new__(self, args)

        elif kwargs:
            # create row objects
            names = sorted(kwargs.keys()) # just get rid of sorting here!!!
            row = tuple.__new__(self, [kwargs[n] for n in names])
            row.__fields__ = names
            return row

        else:
            raise ValueError("No args or kwargs")




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to