[
https://issues.apache.org/jira/browse/SPARK-15696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin resolved SPARK-15696.
---------------------------------
Resolution: Fixed
Assignee: Dongjoon Hyun
Fix Version/s: 2.0.0
> Improve `crosstab` to have a consistent column order
> -----------------------------------------------------
>
> Key: SPARK-15696
> URL: https://issues.apache.org/jira/browse/SPARK-15696
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Dongjoon Hyun
> Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> Currently, `crosstab` have **random-order** columns obtained by just
> `distinct`. Also, the documentation of `crosstab` also shows the result in a
> sorted order which is different from the implementation.
> {code}
> scala> spark.createDataFrame(Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3,
> 2), (3, 3))).toDF("key", "value").stat.crosstab("key", "value").show()
> +---------+---+---+---+
> |key_value| 3| 2| 1|
> +---------+---+---+---+
> | 2| 1| 0| 2|
> | 1| 0| 1| 1|
> | 3| 1| 1| 0|
> +---------+---+---+---+
> scala> spark.createDataFrame(Seq((1, "a"), (1, "b"), (2, "a"), (2, "a"), (2,
> "c"), (3, "b"), (3, "c"))).toDF("key", "value").stat.crosstab("key",
> "value").show()
> +---------+---+---+---+
> |key_value| c| a| b|
> +---------+---+---+---+
> | 2| 1| 2| 0|
> | 1| 0| 1| 1|
> | 3| 1| 0| 1|
> +---------+---+---+---+
> {code}
> This issue explicitly constructs the columns in a sorted order in order to
> improve user experience. Also, this implementation gives the same result with
> the documentation.
> {code}
> scala> spark.createDataFrame(Seq((1, 1), (1, 2), (2, 1), (2, 1), (2, 3), (3,
> 2), (3, 3))).toDF("key", "value").stat.crosstab("key", "value").show()
> +---------+---+---+---+
> |key_value| 1| 2| 3|
> +---------+---+---+---+
> | 2| 2| 0| 1|
> | 1| 1| 1| 0|
> | 3| 0| 1| 1|
> +---------+---+---+---+
> scala> spark.createDataFrame(Seq((1, "a"), (1, "b"), (2, "a"), (2, "a"), (2,
> "c"), (3, "b"), (3, "c"))).toDF("key", "value").stat.crosstab("key",
> "value").show()
> +---------+---+---+---+
>
> |key_value| a| b| c|
> +---------+---+---+---+
> | 2| 2| 0| 1|
> | 1| 1| 1| 0|
> | 3| 0| 1| 1|
> +---------+---+---+---+
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]