[
https://issues.apache.org/jira/browse/SPARK-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209220#comment-15209220
]
Xin Wu commented on SPARK-13832:
--------------------------------
[[email protected]] I think when using grouping_id(), you need to pass in all
the columns that are in the group by clause. In this case, it will be
grouping_id(i_category, i_class). The result is like concatenating results of
grouping(<each column>) into a bit vector (a string of ones and zeros), such as
grouping(i_category)+grouping(i_class)
So {code}grouping_id(i_category)+grouping_id(i_class){code} is not correct.
After I changed to use {code}grouping_id(i_category, i_class){code}, the query
returns for the text data files..
I am trying for the parquet files now.
> TPC-DS Query 36 fails with Parser error
> ---------------------------------------
>
> Key: SPARK-13832
> URL: https://issues.apache.org/jira/browse/SPARK-13832
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.6.1
> Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
> Reporter: Roy Cecil
>
> TPC-DS query 36 fails with the following error
> Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed
> Exception in thread "main" org.apache.spark.sql.AnalysisException: expression
> 'i_category' is neither present in the group by, nor is it an aggregate
> function. Add to group by or wrap in first() (or first_value) if you don't
> care which value you get.;
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> at
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
> Query Text pasted here for quick reference.
> select
> sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin
> ,i_category
> ,i_class
> ,grouping__id as lochierarchy
> ,rank() over (
> partition by grouping__id,
> case when grouping__id = 0 then i_category end
> order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as
> rank_within_parent
> from
> store_sales
> ,date_dim d1
> ,item
> ,store
> where
> d1.d_year = 2001
> and d1.d_date_sk = ss_sold_date_sk
> and i_item_sk = ss_item_sk
> and s_store_sk = ss_store_sk
> and s_state in ('TN','TN','TN','TN',
> 'TN','TN','TN','TN')
> group by i_category,i_class WITH ROLLUP
> order by
> lochierarchy desc
> ,case when lochierarchy = 0 then i_category end
> ,rank_within_parent
> limit 100;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]