Yoshi Matsuzaki created SPARK-32014:
---------------------------------------

             Summary: Support calling stored procedure on JDBC data source
                 Key: SPARK-32014
                 URL: https://issues.apache.org/jira/browse/SPARK-32014
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.0.0
            Reporter: Yoshi Matsuzaki


Currently, all queries via JDBC data source are enveloped by outer SELECT as 
described below:

[https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html]
{quote}
A query that will be used to read data into Spark. The specified query will be 
parenthesized and used as a subquery in the FROM clause. Spark will also assign 
an alias to the subquery clause. As an example, spark will issue a query of the 
following form to the JDBC Source.

SELECT <columns> FROM (<user_specified_query>) spark_gen_alias
{quote}

Because of the behavior, we cannot call a stored procedure in major databases, 
because stored procedure call syntax is usually not allowed to be used in a 
subquery because its returned value is optional.

For example, below Scala code to execute a query on Snowflake as JDBC data 
source raises a syntax error, because the query "call proc()" is rewritten to 
"select * from (call proc()) where 1 = 0", and it is invalid because CALL 
cannot be in the middle of a query.

{code:scala}
val df: DataFrame = spark.read
  .format("snowflake")
  .options(options)
  .option("query", "call proc()")
  .load()

display(df)
{code}

I tested this with Snowflake, but it should happen in any major database 
systems.

I understand JDBC data source is to read and write data through Dataframe, then 
the interfaces implemented are just to read and write, but sometimes we need to 
just execute some queries before or after reading/writing, for example, to 
preprocess the data by stored procedure.

I would appreciate it if you could consider to implement some interface/way to 
allow us to call a stored procedure.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to