[ 
https://issues.apache.org/jira/browse/SPARK-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209220#comment-15209220
 ] 

Xin Wu commented on SPARK-13832:
--------------------------------

[~jfc...@us.ibm.com] I think when using grouping_id(), you need to pass in all 
the columns that are in the group by clause. In this case, it will be 
grouping_id(i_category, i_class). The result is like concatenating results of 
grouping(<each column>) into a bit vector (a string of ones and zeros), such as 
grouping(i_category)+grouping(i_class)

So {code}grouping_id(i_category)+grouping_id(i_class){code} is not correct. 
After I changed to use {code}grouping_id(i_category, i_class){code}, the query 
returns for the text data files.. 
I am trying for the parquet files now. 



> TPC-DS Query 36 fails with Parser error
> ---------------------------------------
>
>                 Key: SPARK-13832
>                 URL: https://issues.apache.org/jira/browse/SPARK-13832
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1
>         Environment: Red Hat Enterprise Linux Server release 7.1 (Maipo)
> Linux bigaperf116.svl.ibm.com 3.10.0-229.el7.x86_64 #1 SMP Thu Jan 29 
> 18:37:38 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
>            Reporter: Roy Cecil
>
> TPC-DS query 36 fails with the following error
> Analyzer error: 16/02/28 21:22:51 INFO parse.ParseDriver: Parse Completed
> Exception in thread "main" org.apache.spark.sql.AnalysisException: expression 
> 'i_category' is neither present in the group by, nor is it an aggregate 
> function. Add to group by or wrap in first() (or first_value) if you don't 
> care which value you get.;
>         at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
>         at 
> org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:44)
> Query Text pasted here for quick reference.
>   select
>     sum(ss_net_profit)/sum(ss_ext_sales_price) as gross_margin
>    ,i_category
>    ,i_class
>    ,grouping__id as lochierarchy
>    ,rank() over (
>         partition by grouping__id,
>         case when grouping__id = 0 then i_category end
>         order by sum(ss_net_profit)/sum(ss_ext_sales_price) asc) as 
> rank_within_parent
>  from
>     store_sales
>    ,date_dim       d1
>    ,item
>    ,store
>  where
>     d1.d_year = 2001
>  and d1.d_date_sk = ss_sold_date_sk
>  and i_item_sk  = ss_item_sk
>  and s_store_sk  = ss_store_sk
>  and s_state in ('TN','TN','TN','TN',
>                  'TN','TN','TN','TN')
>  group by i_category,i_class WITH ROLLUP
>  order by
>    lochierarchy desc
>   ,case when lochierarchy = 0 then i_category end
>   ,rank_within_parent
>     limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to