[ 
https://issues.apache.org/jira/browse/SPARK-13831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197944#comment-15197944
 ] 

kevin yu commented on SPARK-13831:
----------------------------------

The same query will fail at spark sql 2.0 . And the failure can simply to 
select c_customer_sk from customer where exists (select cr_refunded_customer_sk 
from catalog_returns)

or 

select c_customer_sk from customer where exists (select cr_refunded_customer_sk 
from catalog_returns where cr_refunded_customer_sk = customer.c_customer_sk)

in Hive, it can pass the syntax. 
[~davies] can you confirm that spark sql is not supporting subquery with exist 
yet? 

> TPC-DS Query 35 fails with the following compile error
> ------------------------------------------------------
>
>                 Key: SPARK-13831
>                 URL: https://issues.apache.org/jira/browse/SPARK-13831
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Roy Cecil
>
> TPC-DS Query 35 fails with the following compile error.
> Scala.NotImplementedError: 
> scala.NotImplementedError: No parse rules for ASTNode type: 864, text: 
> TOK_SUBQUERY_EXPR :
> TOK_SUBQUERY_EXPR 1, 439,797, 1370
>   TOK_SUBQUERY_OP 1, 439,439, 1370
>     exists 1, 439,439, 1370
>   TOK_QUERY 1, 441,797, 1508
> Pasting Query 35 for easy reference.
> select
>   ca_state,
>   cd_gender,
>   cd_marital_status,
>   cd_dep_count,
>   count(*) cnt1,
>   min(cd_dep_count) cd_dep_count1,
>   max(cd_dep_count) cd_dep_count2,
>   avg(cd_dep_count) cd_dep_count3,
>   cd_dep_employed_count,
>   count(*) cnt2,
>   min(cd_dep_employed_count) cd_dep_employed_count1,
>   max(cd_dep_employed_count) cd_dep_employed_count2,
>   avg(cd_dep_employed_count) cd_dep_employed_count3,
>   cd_dep_college_count,
>   count(*) cnt3,
>   min(cd_dep_college_count) cd_dep_college_count1,
>   max(cd_dep_college_count) cd_dep_college_count2,
>   avg(cd_dep_college_count) cd_dep_college_count3
>  from
>   customer c
>   JOIN customer_address ca ON c.c_current_addr_sk = ca.ca_address_sk
>   JOIN customer_demographics ON cd_demo_sk = c.c_current_cdemo_sk
>   LEFT SEMI JOIN
>   (select ss_customer_sk
>           from store_sales
>                JOIN date_dim ON ss_sold_date_sk = d_date_sk
>           where
>                 d_year = 2002 and
>                 d_qoy < 4) ss_wh1
>   ON c.c_customer_sk = ss_wh1.ss_customer_sk
>  where
>    exists (
>             select tmp.customer_sk from (
>             select ws_bill_customer_sk  as customer_sk
>             from web_sales,date_dim
>             where
>                   ws_sold_date_sk = d_date_sk and
>                   d_year = 2002 and
>                   d_qoy < 4
>            UNION ALL
>             select cs_ship_customer_sk  as customer_sk
>             from catalog_sales,date_dim
>             where
>                   cs_sold_date_sk = d_date_sk and
>                   d_year = 2002 and
>                   d_qoy < 4
>           ) tmp where c.c_customer_sk = tmp.customer_sk
>     )
>  group by ca_state,
>           cd_gender,
>           cd_marital_status,
>           cd_dep_count,
>           cd_dep_employed_count,
>           cd_dep_college_count
>  order by ca_state,
>           cd_gender,
>           cd_marital_status,
>           cd_dep_count,
>           cd_dep_employed_count,
>           cd_dep_college_count
>  limit 100;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to