Kent Yao created SPARK-33419:
--------------------------------
Summary: Unexpected behavior when using SET commands before a
query in SparkSession.sql
Key: SPARK-33419
URL: https://issues.apache.org/jira/browse/SPARK-33419
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 2.4.7, 3.0.2, 3.1.0
Reporter: Kent Yao
SparkSession.sql converts a string value to a DataFrame, and the string value
should be one single SQL statement ending up w/ or w/o one or more semicolons.
e.g.
{code:sql}
scala> spark.sql(" select 2").show
+---+
| 2|
+---+
| 2|
+---+
scala> spark.sql(" select 2;").show
+---+
| 2|
+---+
| 2|
+---+
scala> spark.sql(" select 2;;;;").show
+---+
| 2|
+---+
| 2|
+---+
{code}
If you put 2 or more statements in, it fails in the parser e.g.
{code:java}
scala> spark.sql(" select 2; select 1;").show
org.apache.spark.sql.catalyst.parser.ParseException:
extraneous input 'select' expecting {<EOF>, ';'}(line 1, pos 11)
== SQL ==
select 2; select 1;
-----------^^^
at
org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:263)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:130)
at
org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
at
org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:81)
at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:610)
at
org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:610)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:769)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:607)
... 47 elided
{code}
As a very generic user scenario, they want to change some settings before they
execute
the queries. They may pass a string value like `set spark.sql.abc=2; select 1;`
into this API, which creates a confusing gap between the actual effect and the
user's expectations.
The user may want the query to be executed with spark.sql.abc=2, but Spark
actually treats the whole part of `2; select 1;` as the value of the property
'spark.sql.abc',
e.g.
{code:java}
scala> spark.sql("set spark.sql.abc=2; select 1;").show
+-------------+------------+
| key| value|
+-------------+------------+
|spark.sql.abc|2; select 1;|
+-------------+------------+
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]