Kent Yao created SPARK-33419:
--------------------------------

             Summary: Unexpected behavior when using SET commands before a 
query in SparkSession.sql
                 Key: SPARK-33419
                 URL: https://issues.apache.org/jira/browse/SPARK-33419
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.7, 3.0.2, 3.1.0
            Reporter: Kent Yao


SparkSession.sql converts a string value to a DataFrame, and the string value 
should be one single SQL statement ending up w/ or w/o one or more semicolons. 
e.g.


{code:sql}
scala> spark.sql(" select 2").show
+---+
|  2|
+---+
|  2|
+---+


scala> spark.sql(" select 2;").show
+---+
|  2|
+---+
|  2|
+---+

scala> spark.sql(" select 2;;;;").show
+---+
|  2|
+---+
|  2|
+---+
{code}


If you put 2 or more statements in, it fails in the parser e.g.  


{code:java}
scala> spark.sql(" select 2; select 1;").show
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 11)

== SQL ==
 select 2; select 1;
-----------^^^

  at 
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
  at 
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
  at 
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
  at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
  at 
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
  at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
  at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
  ... 47 elided

{code}

As a very generic user scenario,  they want to change some settings before they 
execute
the queries. They may pass a string value like `set spark.sql.abc=2; select 1;` 
into this API, which creates a confusing gap between the actual effect and the 
user's expectations.

The user may want the query to be executed with spark.sql.abc=2, but Spark 
actually treats the whole part of `2; select 1;` as the value of the property 
'spark.sql.abc',
 e.g.

{code:java}
scala> spark.sql("set spark.sql.abc=2; select 1;").show
+-------------+------------+
|          key|       value|
+-------------+------------+
|spark.sql.abc|2; select 1;|
+-------------+------------+
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to