[
https://issues.apache.org/jira/browse/SPARK-48918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Herman van Hövell updated SPARK-48918:
--------------------------------------
Description:
*Motivation*
Current the scala sql/core and connect API share the same API; connect
implements a subset of the functionality of the sql/core API. The compatibility
of the two implementations is enforced by MiMa checks.
While this sort of works for application development, it is not ideal for a
couple of reasons:
* An application developer needs to pick against which API they are going to
develop while setting up their project (they need to select the correct
dependencies). While it is true, that they can this change later, it does put a
mental burden on de the developer. A much preferred solution would be to defer
binding to an implementation until you run the code.
* (Minor) the current setup confuses IDEs, and is more of a pain to work with
especially for Spark developers.
* Developing and maintaining Spark API is more difficult because of the added
burden of working with MiMa and/or adding the same API in more places.
* Connect testing is fairly anaemic. We have seen a couple of cases where
connect behaves slightly different, and this could have been detected if
connect was able to leverage Spark SQLs extensive testing.
*Goals*
* Create a truly shared Scala API with two implementations. The goal is *not*
to replace/simplify/reduce the current sql/core API we all love, the interface
will only support the API shared between the implementations. An implementation
can provide additional functionality (e.g. RDD centric methods for the sql/core
implementation).
* The common interface should cover all API supported by the current Connect
Scala client.
* Maintain as much binary compatibility with previous Spark releases as
possible
*Design Notes*
* We are going to try to make the interface very connect centric. Where
possible we will implement functionality using the connect API.
* .... TBD
> Create a unified SQL Scala interface shared by regular SQL and Connect.
> -----------------------------------------------------------------------
>
> Key: SPARK-48918
> URL: https://issues.apache.org/jira/browse/SPARK-48918
> Project: Spark
> Issue Type: Epic
> Components: Connect, SQL
> Affects Versions: 4.0.0
> Reporter: Herman van Hövell
> Priority: Major
>
> *Motivation*
> Current the scala sql/core and connect API share the same API; connect
> implements a subset of the functionality of the sql/core API. The
> compatibility of the two implementations is enforced by MiMa checks.
> While this sort of works for application development, it is not ideal for a
> couple of reasons:
> * An application developer needs to pick against which API they are going to
> develop while setting up their project (they need to select the correct
> dependencies). While it is true, that they can this change later, it does put
> a mental burden on de the developer. A much preferred solution would be to
> defer binding to an implementation until you run the code.
> * (Minor) the current setup confuses IDEs, and is more of a pain to work
> with especially for Spark developers.
> * Developing and maintaining Spark API is more difficult because of the
> added burden of working with MiMa and/or adding the same API in more places.
> * Connect testing is fairly anaemic. We have seen a couple of cases where
> connect behaves slightly different, and this could have been detected if
> connect was able to leverage Spark SQLs extensive testing.
> *Goals*
> * Create a truly shared Scala API with two implementations. The goal is
> *not* to replace/simplify/reduce the current sql/core API we all love, the
> interface will only support the API shared between the implementations. An
> implementation can provide additional functionality (e.g. RDD centric methods
> for the sql/core implementation).
> * The common interface should cover all API supported by the current Connect
> Scala client.
> * Maintain as much binary compatibility with previous Spark releases as
> possible
> *Design Notes*
> * We are going to try to make the interface very connect centric. Where
> possible we will implement functionality using the connect API.
> * .... TBD
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]