[
https://issues.apache.org/jira/browse/SPARK-21858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenchen Fan closed SPARK-21858.
-------------------------------
Resolution: Not A Problem
> Make Spark grouping_id() compatible with Hive grouping__id
> ----------------------------------------------------------
>
> Key: SPARK-21858
> URL: https://issues.apache.org/jira/browse/SPARK-21858
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.2.0
> Reporter: Yann Byron
>
> If you want to migrate some ETLs using `grouping__id` in Hive to Spark and
> use Spark `grouping_id()` instead of Hive `grouping__id`, you will find
> difference between their evaluations.
> Here is an example.
> {code:java}
> select A, B, grouping__id/grouping_id() from t group by A, B grouping
> sets((), (A), (B), (A,B))
> {code}
> Running it on Hive and Spark separately, you'll find this: (the selected
> attribute in selected grouping set is represented by (/) and otherwise by
> (x))
> ||A B||Binary Expression in Spark||Spark||Hive||Binary Expression in Hive||B
> A||
> |(x) (x)|11|3|0|00|(x) (x)|
> |(x) (/)|10|2|2|10|(/) (x)|
> |(/) (x)|01|1|1|01|(x) (/)|
> |(/) (/)|00|0|3|11|(/) (/)|
> As shown above,In Hive, (/) set to 0, (x) set to 1, and in Spark it's
> opposite.
> Moreover, attributes in `group by` will reverse firstly in Hive. In Spark
> it'll be evaluated directly.
> In my opinion, I suggest that modifying the behavior of `grouping_id()` make
> it compatible with Hive `grouping__id`.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]