[ 
https://issues.apache.org/jira/browse/SPARK-24260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-24260.
----------------------------------
    Resolution: Incomplete

> Support for multi-statement SQL in SparkSession.sql API
> -------------------------------------------------------
>
>                 Key: SPARK-24260
>                 URL: https://issues.apache.org/jira/browse/SPARK-24260
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.0
>            Reporter: Ravindra Nath Kakarla
>            Priority: Minor
>              Labels: bulk-closed
>
> sparkSession.sql API only supports a single SQL statement to be executed for 
> a call. A multi-statement SQL cannot be executed in a single call. For 
> example,
> {code:java}
> SparkSession sparkSession = 
> SparkSession.builder().appName("MultiStatementSQL")                           
>                .master("local").config("", "").getOrCreate()
> sparkSession.sql("DROP TABLE IF EXISTS count_employees; CACHE TABLE 
> employees; CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt 
> FROM employees; SELECT * FROM count_employees") 
> {code}
> Above code fails with the error, 
> {code:java}
> org.apache.spark.sql.catalyst.parser.ParseException: mismatched input ';' 
> expecting <EOF>{code}
> Solution to this problem is to use the .sql API multiple times in a specific 
> order.
> {code:java}
> sparkSession.sql("DROP TABLE IF EXISTS count_employees")
> sparkSession.sql("CACHE TABLE employees")
> sparkSession.sql("CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as 
> cnt FROM employees;")
> sparkSession.sql("SELECT * FROM count_employees")
> {code}
> If these SQL statements come from a string / file, users have to implement 
> their own parsers to execute this. Like,
> {code:java}
> val sqlFromFile = """DROP TABLE IF EXISTS count_employees;
>  |CACHE TABLE employees;
>  |CREATE TEMPORARY VIEW count_employees AS SELECT count(*) as cnt FROM 
> employees; SELECT * FROM count_employees""".stripMargin{code}
> {code:java}
> sqlFromFile.split(";")
> .forEach(line => sparkSession.sql(line))
> {code}
> This naive parser can fail for many edge cases (like ";" inside a string). 
> Even if users use the same grammar used by Spark and implement their own 
> parsing, it can go out of sync with the way Spark parses the statements.
> Can support for multiple SQL statements be built into SparkSession.sql API 
> itself?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to