[
https://issues.apache.org/jira/browse/SPARK-31363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-31363:
------------------------------------
Assignee: Apache Spark
> Improve DataSourceRegister interface
> ------------------------------------
>
> Key: SPARK-31363
> URL: https://issues.apache.org/jira/browse/SPARK-31363
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.4.5, 3.0.0
> Reporter: Andrew Malone Melo
> Assignee: Apache Spark
> Priority: Minor
>
> As the DSv2 API evolves, some breaking changes are occasionally made to the
> API. It's possible to split a plugin into a "common" part and multiple
> version-specific parts and this works good to have a single artifact for
> users. The one part that can't be currently worked around is the
> DataSourceRegister trait. This is an issue because users cargo-cult
> configuration values, and choosing the wrong plugin version gives a
> particularly baroque error message that bubbles up through ServiceLoader.
> Currently, the class implementing DataSourceRegister must also be the class
> implementing the "toplevel" DataSourceV2 interface (and mixins), and these
> various interfaces occasionally change as the API evolves. As a practical
> matter, this means that there's no opportunity to decide at runtime which
> class to pass along to Spark. Attempting to add multiple DataSourceV2
> implementations to services/META-INF causes an exception when the
> ServiceLoader tries to load the DataSourceRegister who implements the
> "different" DataSourceV2.
> I would like to propose a new DataSourceRegister interface which adds a level
> of indirection between the what ServiceLoader and DataSourceV2 loads. E.g.
> (strawman)
> {{interface DataSourceRegisterV2 {}}
> {{ public String shortName();}}
> {{ public Class<? implements DataSourceV2> getImplementation();}}
> {{ }}}
> Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource
> would have its search algorithm extended to look for DataSourceRegisterV2
> objects, and if one is located for the given shortName, return the class
> object from getImplementation(). At this point, the plugin could decide based
> on the current runtime environment which class to prevent to Spark. There
> wouldn't be any changes to plugins who don't implement this API.
> If this is an acceptable idea, I can put together a PR for further comment.
> Thanks
> Andrew
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]