[jira] [Updated] (SPARK-48918) Create a unified SQL Scala interface shared by regular SQL and Connect.

Jira Tue, 16 Jul 2024 18:40:11 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-48918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Herman van Hövell updated SPARK-48918:
--------------------------------------
    Description: 
*Motivation*

Current the scala sql/core and connect API share the same API; connect 
implements a subset of the functionality of the sql/core API. The compatibility 
of the two implementations is enforced by MiMa checks.

While this sort of works for application development, it is not ideal for a 
couple of reasons:
 * An application developer needs to pick against which API they are going to 
develop while setting up their project (they need to select the correct 
dependencies). While it is true, that they can this change later, it does put a 
mental burden on de the developer. A much preferred solution would be to defer 
binding to an implementation until you run the code.
 * (Minor) the current setup confuses IDEs, and is more of a pain to work with 
especially for Spark developers.
 * Developing and maintaining Spark API is more difficult because of the added 
burden of working with MiMa and/or adding the same API in more places.
 * Connect testing is fairly anaemic. We have seen a couple of cases where 
connect behaves slightly different, and this could have been detected if 
connect was able to leverage Spark SQLs extensive testing.

*Goals*
 * Create a truly shared Scala API with two implementations. The goal is *not* 
to replace/simplify/reduce the current sql/core API we all love, the interface 
will only support the API shared between the implementations. An implementation 
can provide additional functionality (e.g. RDD centric methods for the sql/core 
implementation).
 * The common interface should cover all API supported by the current Connect 
Scala client.
 * Maintain as much binary compatibility with previous Spark releases as 
possible

*Design Notes*
 * We are going to try to make the interface very connect centric. Where 
possible we will implement functionality using the connect API.
 * .... TBD

> Create a unified SQL Scala interface shared by regular SQL and Connect.
> -----------------------------------------------------------------------
>
>                 Key: SPARK-48918
>                 URL: https://issues.apache.org/jira/browse/SPARK-48918
>             Project: Spark
>          Issue Type: Epic
>          Components: Connect, SQL
>    Affects Versions: 4.0.0
>            Reporter: Herman van Hövell
>            Priority: Major
>
> *Motivation*
> Current the scala sql/core and connect API share the same API; connect 
> implements a subset of the functionality of the sql/core API. The 
> compatibility of the two implementations is enforced by MiMa checks.
> While this sort of works for application development, it is not ideal for a 
> couple of reasons:
>  * An application developer needs to pick against which API they are going to 
> develop while setting up their project (they need to select the correct 
> dependencies). While it is true, that they can this change later, it does put 
> a mental burden on de the developer. A much preferred solution would be to 
> defer binding to an implementation until you run the code.
>  * (Minor) the current setup confuses IDEs, and is more of a pain to work 
> with especially for Spark developers.
>  * Developing and maintaining Spark API is more difficult because of the 
> added burden of working with MiMa and/or adding the same API in more places.
>  * Connect testing is fairly anaemic. We have seen a couple of cases where 
> connect behaves slightly different, and this could have been detected if 
> connect was able to leverage Spark SQLs extensive testing.
> *Goals*
>  * Create a truly shared Scala API with two implementations. The goal is 
> *not* to replace/simplify/reduce the current sql/core API we all love, the 
> interface will only support the API shared between the implementations. An 
> implementation can provide additional functionality (e.g. RDD centric methods 
> for the sql/core implementation).
>  * The common interface should cover all API supported by the current Connect 
> Scala client.
>  * Maintain as much binary compatibility with previous Spark releases as 
> possible
> *Design Notes*
>  * We are going to try to make the interface very connect centric. Where 
> possible we will implement functionality using the connect API.
>  * .... TBD



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-48918) Create a unified SQL Scala interface shared by regular SQL and Connect.

Reply via email to