Ivan created SPARK-36163:
----------------------------

             Summary: Propagate correct JDBC properties in JDBC connector 
provider and add "connectionProvider" option
                 Key: SPARK-36163
                 URL: https://issues.apache.org/jira/browse/SPARK-36163
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 3.1.2, 3.1.1, 3.1.0
            Reporter: Ivan


There are a couple of issues with JDBC connection providers. The first is a bug 
caused by 
[https://github.com/apache/spark/commit/c3ce9701b458511255072c72b9b245036fa98653]
 where we would pass all properties, including JDBC data source keys, to the 
JDBC driver which results in errors like {{java.sql.SQLException: Unrecognized 
connection property 'url'}}.

Connection properties are supposed to only include vendor properties, url 
config is a JDBC option and should be excluded.

The fix would be replacing {{jdbcOptions.asProperties.asScala.foreach}} with 
{{jdbcOptions.asConnectionProperties.asScala.foreach}} which is java.sql.Driver 
friendly.

 

I also investigated the problem with multiple providers and I think there are a 
couple of oversights in {{ConnectionProvider}} implementation. I think it is 
missing two things:
 * Any {{JdbcConnectionProvider}} should take precedence over 
{{BasicConnectionProvider}}. {{BasicConnectionProvider}} should only be 
selected if there was no match found when inferring providers that can handle 
JDBC url.

 * There is currently no way to select a specific provider that you want, 
similar to how you can select a JDBC driver. The use case is, for example, 
having connection providers for two databases that handle the same URL but have 
slightly different semantics and you want to select one in one case and the 
other one in others.

 ** I think the first point could be discarded when the second one is addressed.

You can technically use {{spark.sql.sources.disabledJdbcConnProviderList}} to 
exclude ones that don’t need to be included, but I am not quite sure why it was 
done that way - it is much simpler to allow users to enforce the provider they 
want.

This ticket fixes it by adding a {{connectionProvider}} option to the JDBC data 
source that allows users to select a particular provider when the ambiguity 
arises.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to