Terry Siu created SPARK-4226:
--------------------------------

             Summary: SparkSQL - Add support for subqueries in predicates
                 Key: SPARK-4226
                 URL: https://issues.apache.org/jira/browse/SPARK-4226
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 1.2.0
         Environment: Spark 1.2 snapshot
            Reporter: Terry Siu


I have a test table defined in Hive as follows:

CREATE TABLE sparkbug (
  id INT,
  event STRING
) STORED AS PARQUET;

and insert some sample data with ids 1, 2, 3.

In a Spark shell, I then create a HiveContext and then execute the following 
HQL to test out subquery predicates:

val hc = HiveContext(hc)
hc.hql("select customerid from sparkbug where customerid in (select customerid 
from sparkbug where customerid in (2,3))")

I get the following error:

java.lang.RuntimeException: Unsupported language features in query: select 
customerid from sparkbug where customerid in (select customerid from sparkbug 
where customerid in (2,3))
TOK_QUERY
  TOK_FROM
    TOK_TABREF
      TOK_TABNAME
        sparkbug
  TOK_INSERT
    TOK_DESTINATION
      TOK_DIR
        TOK_TMP_FILE
    TOK_SELECT
      TOK_SELEXPR
        TOK_TABLE_OR_COL
          customerid
    TOK_WHERE
      TOK_SUBQUERY_EXPR
        TOK_SUBQUERY_OP
          in
        TOK_QUERY
          TOK_FROM
            TOK_TABREF
              TOK_TABNAME
                sparkbug
          TOK_INSERT
            TOK_DESTINATION
              TOK_DIR
                TOK_TMP_FILE
            TOK_SELECT
              TOK_SELEXPR
                TOK_TABLE_OR_COL
                  customerid
            TOK_WHERE
              TOK_FUNCTION
                in
                TOK_TABLE_OR_COL
                  customerid
                2
                3
        TOK_TABLE_OR_COL
          customerid

scala.NotImplementedError: No parse rules for ASTNode type: 817, text: 
TOK_SUBQUERY_EXPR :
TOK_SUBQUERY_EXPR
  TOK_SUBQUERY_OP
    in
  TOK_QUERY
    TOK_FROM
      TOK_TABREF
        TOK_TABNAME
          sparkbug
    TOK_INSERT
      TOK_DESTINATION
        TOK_DIR
          TOK_TMP_FILE
      TOK_SELECT
        TOK_SELEXPR
          TOK_TABLE_OR_COL
            customerid
      TOK_WHERE
        TOK_FUNCTION
          in
          TOK_TABLE_OR_COL
            customerid
          2
          3
  TOK_TABLE_OR_COL
    customerid
" +
         
org.apache.spark.sql.hive.HiveQl$.nodeToExpr(HiveQl.scala:1098)
        
        at scala.sys.package$.error(package.scala:27)
        at org.apache.spark.sql.hive.HiveQl$.createPlan(HiveQl.scala:252)
        at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:50)
        at 
org.apache.spark.sql.hive.ExtendedHiveQlParser$$anonfun$hiveQl$1.apply(ExtendedHiveQlParser.scala:49)
        at scala.util.parsing.combinator.Parsers$Success.map(Parsers.scala:136)

This thread

http://apache-spark-user-list.1001560.n3.nabble.com/Subquery-in-having-clause-Spark-1-1-0-td17401.html

also brings up lack of subquery support in SparkSQL. It would be nice to have 
subquery predicate support in a near, future release (1.3, maybe?).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to