Jiri Syrovy created SPARK-13516:
-----------------------------------
Summary: Dataframe structure inconsistency after projection.
Key: SPARK-13516
URL: https://issues.apache.org/jira/browse/SPARK-13516
Project: Spark
Issue Type: Bug
Components: Java API, SQL
Affects Versions: 1.6.0
Environment: Local mode, java version 1.8.0_45
Reporter: Jiri Syrovy
Seems that subsequent Aggregation + Adding static column + Union + Projection
causes DataFrame inconsistency.
The problem appears int the following case:
- Let's have some DataFrame called df.
1) Aggregation of multiple columns on the Dataframe df and store result as
result_agg_1
2) Do another aggregation of multiple columns, but on one less grouping columns
and store the result as result_agg_2
3) Align the result of second aggregation by adding missing grouping column
with value empty lit("")
4) Union result_agg_1 and result_agg_2
5) Do the projection from "sum(count_column)" to "count_column" for all
aggregated columns.
The result is structurally inconsistent DataFrame that has all the data coming
from result_agg_1 shifted.
An example of stripped down code and example result can be seen here:
https://gist.github.com/xjrk58/e0c7171287ee9bdc8df8
https://gist.github.com/xjrk58/7a297a42ebb94f300d96
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]