[
https://issues.apache.org/jira/browse/SPARK-39375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617373#comment-17617373
]
Apache Spark commented on SPARK-39375:
--------------------------------------
User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/38244
> SPIP: Spark Connect - A client and server interface for Apache Spark
> --------------------------------------------------------------------
>
> Key: SPARK-39375
> URL: https://issues.apache.org/jira/browse/SPARK-39375
> Project: Spark
> Issue Type: Improvement
> Components: Connect
> Affects Versions: 3.4.0
> Reporter: Martin Grund
> Priority: Critical
> Labels: SPIP
>
> Please find the full document for discussion here: [Spark Connect
> SPIP|https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj]
> Below, we have just referenced the introduction.
> h2. What are you trying to do?
> While Spark is used extensively, it was designed nearly a decade ago, which,
> in the age of serverless computing and ubiquitous programming language use,
> poses a number of limitations. Most of the limitations stem from the tightly
> coupled Spark driver architecture and fact that clusters are typically shared
> across users: (1) {*}Lack of built-in remote connectivity{*}: the Spark
> driver runs both the client application and scheduler, which results in a
> heavyweight architecture that requires proximity to the cluster. There is no
> built-in capability to remotely connect to a Spark cluster in languages
> other than SQL and users therefore rely on external solutions such as the
> inactive project [Apache Livy|https://livy.apache.org/]. (2) {*}Lack of rich
> developer experience{*}: The current architecture and APIs do not cater for
> interactive data exploration (as done with Notebooks), or allow for building
> out rich developer experience common in modern code editors. (3)
> {*}Stability{*}: with the current shared driver architecture, users causing
> critical exceptions (e.g. OOM) bring the whole cluster down for all users.
> (4) {*}Upgradability{*}: the current entangling of platform and client APIs
> (e.g. first and third-party dependencies in the classpath) does not allow for
> seamless upgrades between Spark versions (and with that, hinders new feature
> adoption).
>
> We propose to overcome these challenges by building on the DataFrame API and
> the underlying unresolved logical plans. The DataFrame API is widely used and
> makes it very easy to iteratively express complex logic. We will introduce
> {_}Spark Connect{_}, a remote option of the DataFrame API that separates the
> client from the Spark server. With Spark Connect, Spark will become
> decoupled, allowing for built-in remote connectivity: The decoupled client
> SDK can be used to run interactive data exploration and connect to the server
> for DataFrame operations.
>
> Spark Connect will benefit Spark developers in different ways: The decoupled
> architecture will result in improved stability, as clients are separated from
> the driver. From the Spark Connect client perspective, Spark will be (almost)
> versionless, and thus enable seamless upgradability, as server APIs can
> evolve without affecting the client API. The decoupled client-server
> architecture can be leveraged to build close integrations with local
> developer tooling. Finally, separating the client process from the Spark
> server process will improve Spark’s overall security posture by avoiding the
> tight coupling of the client inside the Spark runtime environment.
>
> Spark Connect will strengthen Spark’s position as the modern unified engine
> for large-scale data analytics and expand applicability to use cases and
> developers we could not reach with the current setup: Spark will become
> ubiquitously usable as the DataFrame API can be used with (almost) any
> programming language.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]