[
https://issues.apache.org/jira/browse/SPARK-15918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15386225#comment-15386225
]
Wade Salazar commented on SPARK-15918:
--------------------------------------
Is the stance on this issue that since other SQL interpreters require both
SELECT statements to be in order Spark will follow? Though this seems
relatively innocuous it leads to considerable amount of troubleshooting when
this behavior is not expected. Can we request that the requirement to have
each SELECT statement eliminated as an improvement in SparkSQL
> unionAll returns wrong result when two dataframes has schema in different
> order
> -------------------------------------------------------------------------------
>
> Key: SPARK-15918
> URL: https://issues.apache.org/jira/browse/SPARK-15918
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Environment: CentOS
> Reporter: Prabhu Joseph
>
> On applying unionAll operation between A and B dataframes, they both has same
> schema but in different order and hence the result has column value mapping
> changed.
> Repro:
> {code}
> A.show()
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> |tag|year_day|tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year|value|
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> +---+--------+-------+------+------+-----+----+-------+------+-------+-------+-----+
> B.show()
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |dtype| tag|
> time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> A = A.unionAll(B)
> A.show()
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> |tag| year_day|
> tm_hour|tm_min|tm_sec|dtype|time|tm_mday|tm_mon|tm_yday|tm_year| value|
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0| 275|
> 2015| 1.2345|2015275.0|
> +---+-------------------+----------+------+------+-----+----+-------+------+-------+-------+---------+
> {code}
> On changing the schema of A according to B and doing unionAll works fine
> {code}
> C =
> A.select("dtype","tag","time","tm_hour","tm_mday","tm_min",”tm_mon”,"tm_sec","tm_yday","tm_year","value","year_day")
> A = C.unionAll(B)
> A.show()
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> |dtype| tag|
> time|tm_hour|tm_mday|tm_min|tm_mon|tm_sec|tm_yday|tm_year| value|year_day|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> | F|C_FNHXUT701Z.CNSTLO|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUDP713.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F| C_FNHXUT718.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUT703Z.CNSTLO|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUR716A.CNSTLO|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F|C_FNHXUT803Z.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F| C_FNHXUT728.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> | F| C_FNHXUR806.CNSTHI|1443790800| 13| 2| 0| 10| 0|
> 275| 2015|1.2345| 2015275|
> +-----+-------------------+----------+-------+-------+------+------+------+-------+-------+------+--------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]