[
https://issues.apache.org/jira/browse/HIVE-15160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680437#comment-15680437
]
Vineet Garg commented on HIVE-15160:
------------------------------------
TPC-DS query 35 has the same issue
{noformat}
select
ca_state,
cd_gender,
cd_marital_status,
count(*) cnt1,
avg(cd_dep_count),
max(cd_dep_count),
sum(cd_dep_count),
cd_dep_employed_count,
count(*) cnt2,
avg(cd_dep_employed_count),
max(cd_dep_employed_count),
sum(cd_dep_employed_count),
cd_dep_college_count,
count(*) cnt3,
avg(cd_dep_college_count),
max(cd_dep_college_count),
sum(cd_dep_college_count)
from
customer c,customer_address ca,customer_demographics
where
c.c_current_addr_sk = ca.ca_address_sk and
cd_demo_sk = c.c_current_cdemo_sk and
exists (select *
from store_sales,date_dim
where c.c_customer_sk = ss_customer_sk and
ss_sold_date_sk = d_date_sk and
d_year = 1999 and
d_qoy < 4) and
(exists (select *
from web_sales,date_dim
where c.c_customer_sk = ws_bill_customer_sk and
ws_sold_date_sk = d_date_sk and
d_year = 1999 and
d_qoy < 4) or
exists (select *
from catalog_sales,date_dim
where c.c_customer_sk = cs_ship_customer_sk and
cs_sold_date_sk = d_date_sk and
d_year = 1999 and
d_qoy < 4))
group by ca_state,
cd_gender,
cd_marital_status,
cd_dep_count,
cd_dep_employed_count,
cd_dep_college_count
order by ca_state,
cd_gender,
cd_marital_status,
cd_dep_count,
cd_dep_employed_count,
cd_dep_college_count
limit 100;
{noformat}
cd_dep_count, cd_dep_employed_count are not selected but are part of order by.
Not sure if this would a simple fix. [~pxiong] Have you looked into this ? Any
idea what would it take to fix this? Quick look at error stack suggests that it
is giving up during operator tree generation while trying to resolve column
names (probably for oder by )
> Can't order by an unselected column
> -----------------------------------
>
> Key: HIVE-15160
> URL: https://issues.apache.org/jira/browse/HIVE-15160
> Project: Hive
> Issue Type: Bug
> Reporter: Pengcheng Xiong
> Assignee: Pengcheng Xiong
>
> If a grouping key hasn't been selected, Hive complains. For comparison,
> Postgres does not.
> Example. Notice i_item_id is not selected:
> {code}
> select i_item_desc
> ,i_category
> ,i_class
> ,i_current_price
> ,sum(cs_ext_sales_price) as itemrevenue
> ,sum(cs_ext_sales_price)*100/sum(sum(cs_ext_sales_price)) over
> (partition by i_class) as revenueratio
> from catalog_sales
> ,item
> ,date_dim
> where cs_item_sk = i_item_sk
> and i_category in ('Jewelry', 'Sports', 'Books')
> and cs_sold_date_sk = d_date_sk
> and d_date between cast('2001-01-12' as date)
> and (cast('2001-01-12' as date) + 30 days)
> group by i_item_id
> ,i_item_desc
> ,i_category
> ,i_class
> ,i_current_price
> order by i_category
> ,i_class
> ,i_item_id
> ,i_item_desc
> ,revenueratio
> limit 100;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)