[ 
https://issues.apache.org/jira/browse/SPARK-31363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31363:
------------------------------------

    Assignee: Apache Spark

> Improve DataSourceRegister interface
> ------------------------------------
>
>                 Key: SPARK-31363
>                 URL: https://issues.apache.org/jira/browse/SPARK-31363
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.5, 3.0.0
>            Reporter: Andrew Malone Melo
>            Assignee: Apache Spark
>            Priority: Minor
>
> As the DSv2 API evolves, some breaking changes are occasionally made to the 
> API. It's possible to split a plugin into a "common" part and multiple 
> version-specific parts and this works good to have a single artifact for 
> users. The one part that can't be currently worked around is the 
> DataSourceRegister trait. This is an issue because users cargo-cult 
> configuration values, and choosing the wrong plugin version gives a 
> particularly baroque error message that bubbles up through ServiceLoader.
> Currently, the class implementing DataSourceRegister must also be the class 
> implementing the "toplevel" DataSourceV2 interface (and mixins), and these 
> various interfaces occasionally change as the API evolves. As a practical 
> matter, this means that there's no opportunity to decide at runtime which 
> class to pass along to Spark. Attempting to add multiple DataSourceV2 
> implementations to services/META-INF causes an exception when the 
> ServiceLoader tries to load the DataSourceRegister who implements the 
> "different" DataSourceV2.
> I would like to propose a new DataSourceRegister interface which adds a level 
> of indirection between the what ServiceLoader and DataSourceV2 loads. E.g. 
> (strawman)
> {{interface DataSourceRegisterV2 {}}
> {{  public String shortName();}}
> {{  public Class<? implements DataSourceV2> getImplementation();}}
> {{ }}}
> Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource 
> would have its search algorithm extended to look for DataSourceRegisterV2 
> objects, and if one is located for the given shortName, return the class 
> object from getImplementation(). At this point, the plugin could decide based 
> on the current runtime environment which class to prevent to Spark. There 
> wouldn't be any changes to plugins who don't implement this API.
> If this is an acceptable idea, I can put together a PR for further comment.
> Thanks
>  Andrew



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to