Jiri Syrovy created SPARK-13516:
-----------------------------------

             Summary: Dataframe structure inconsistency after projection.
                 Key: SPARK-13516
                 URL: https://issues.apache.org/jira/browse/SPARK-13516
             Project: Spark
          Issue Type: Bug
          Components: Java API, SQL
    Affects Versions: 1.6.0
         Environment: Local mode, java version 1.8.0_45
            Reporter: Jiri Syrovy


Seems that subsequent Aggregation + Adding static column + Union + Projection 
causes DataFrame inconsistency. 

The problem appears int  the following case:

- Let's have some DataFrame called df.

1) Aggregation of multiple columns on the Dataframe df and store result as 
result_agg_1
2) Do another aggregation of multiple columns, but on one less grouping columns 
and store the result as result_agg_2
3) Align the result of second aggregation by adding missing grouping column 
with value empty lit("")
4) Union result_agg_1 and result_agg_2
5) Do the projection from "sum(count_column)" to "count_column" for all 
aggregated columns.

The result is structurally inconsistent DataFrame that has all the data coming 
from result_agg_1 shifted.

An example of stripped down code and example result can be seen here:

https://gist.github.com/xjrk58/e0c7171287ee9bdc8df8
https://gist.github.com/xjrk58/7a297a42ebb94f300d96



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to