[ https://issues.apache.org/jira/browse/SPARK-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15328851#comment-15328851 ]
Hyukjin Kwon commented on SPARK-15918: -------------------------------------- Actually, I met this case before and was thinking it might be an issue. However, I realised it seems actually not after executing the equelvant quries in other several DBMS and checking some documentations such as https://msdn.microsoft.com/en-us/library/ms180026.aspx and http://www.w3schools.com/sql/sql_union.asp that say {quote} the columns in each SELECT statement must be in the same order {quote} I haven't read about the official SQL standard though, I am also pretty sure that this is not an issue. > unionAll returns wrong result when two dataframes has schema in different > order > ------------------------------------------------------------------------------- > > Key: SPARK-15918 > URL: https://issues.apache.org/jira/browse/SPARK-15918 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.6.1 > Environment: CentOS > Reporter: Prabhu Joseph > > On applying unionAll operation between A and B dataframes, they both has same > schema but in different order and hence the result has column value mapping > changed. > Repro: > {code} > A.show() > +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+ > |tag|year_day|tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value| > +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+ > +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+ > B.show() > +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ > |dtype| tag| > time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day| > +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ > | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ > A = A.unionAll(B) > A.show() > +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+ > |tag| year_day| > tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year| value| > +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+ > | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275| > 2015| 1.2345|2015275.0| > +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+ > {code} > On changing the schema of A according to B and doing unionAll works fine > {code} > C = > A.select("dtype","tag","time","tm_hour","tm_mday","tm_min",”tm_mon”,"tm_sec","tm_yday","tm_year","value","year_day") > A = C.unionAll(B) > A.show() > +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ > |dtype| tag| > time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day| > +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ > | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| > 275| 2015|1.2345| 2015275| > +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org