Andrew Malone Melo created SPARK-31363:
------------------------------------------

             Summary: Improve DataSourceRegister interface
                 Key: SPARK-31363
                 URL: https://issues.apache.org/jira/browse/SPARK-31363
             Project: Spark
          Issue Type: Improvement
          Components: SQL
    Affects Versions: 2.4.5, 3.0.0
            Reporter: Andrew Malone Melo


As the DSv2 API evolves, some breaking changes are occasionally made to the 
API. It's possible to split a plugin into a "common" part and multiple 
version-specific parts and this works good to have a single artifact for users. 
The one part that can't be currently worked around is the DataSourceRegister 
trait. This is an issue because users cargo-cult configuration values, and 
choosing the wrong plugin version gives a particularly baroque error message 
that bubbles up through ServiceLoader.

Currently, the class implementing DataSourceRegister must also be the class 
implementing the "toplevel" DataSourceV2 interface (and mixins), and these 
various interfaces occasionally change as the API evolves. As a practical 
matter, this means that there's no opportunity to decide at runtime which class 
to pass along to Spark. Attempting to add multiple DataSourceV2 implementations 
to services/META-INF causes an exception when the ServiceLoader tries to load 
the DataSourceRegister who implements the "different" DataSourceV2.

I would like to propose a new DataSourceRegister interface which adds a level 
of indirection between the what ServiceLoader and DataSourceV2 loads. E.g. 
(strawman)

{{interface DataSourceRegisterV2 {
  public String shortName();
  public Class<? implements DataSourceV2> getImplementation();
}}}

Then org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource 
would have its search algorithm extended to look for DataSourceRegisterV2 
objects, and if one is located for the given shortName, return the class object 
from getImplementation(). At this point, the plugin could decide based on the 
current runtime environment which class to prevent to Spark. There wouldn't be 
any changes to plugins who don't implement this API.

If this is an acceptable idea, I can put together a PR for further comment.

Thanks
Andrew



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to