Xiao Li created SPARK-24423:
-------------------------------
Summary: Add a new option `query` for JDBC sources
Key: SPARK-24423
URL: https://issues.apache.org/jira/browse/SPARK-24423
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.3.0
Reporter: Xiao Li
Currently, our JDBC connector provides the option `dbtable` for users to
specify the to-be-loaded JDBC source table.
val jdbcDf = spark.read
.format("jdbc")
.option("*dbtable*", "dbName.tableName")
.options(jdbcCredentials: Map)
.load()
Normally, users do not fetch the whole JDBC table due to the poor
performance/throughput of JDBC. Thus, they normally just fetch a small set of
tables. For advanced users, they can pass a subquery as the option.
val query = """ (select * from tableName limit 10) as tmp """
val jdbcDf = spark.read
.format("jdbc")
.option("*dbtable*", query)
.options(jdbcCredentials: Map)
.load()
However, this is straightforward to end users. We should simply allow users to
specify the query by a new option `query`. We will handle the complexity for
them.
val query = """select * from tableName limit 10"""
val jdbcDf = spark.read
.format("jdbc")
.option("*{color:#ff0000}query{color}*", query)
.options(jdbcCredentials: Map)
.load()
Users are not allowed to specify query and dbtable at the same time.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]