[jira] [Updated] (SPARK-27941) Serverless Spark in the Cloud

Shuheng Dai (JIRA) Tue, 04 Jun 2019 01:03:09 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-27941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shuheng Dai updated SPARK-27941:
--------------------------------
    Description: 
Public cloud providers have started offering serverless container services. For 
example, AWS offers Fargate [https://aws.amazon.com/fargate/]

This opens up the possibility to run Spark workloads in a serverless manner and 
remove the need to provision and maintain a cluster.

While it might not make sense for Spark to favor any particular cloud provider 
or to support a large number of cloud providers natively. It would make sense 
to make some of the internal Spark components more pluggable and cloud friendly 
so that it is easier for various cloud providers to integrate. For example, 
 * authentication: IO and network encryption requires authentication via 
securely sharing a secret, and the implementation of this is currently tied to 
the cluster manager: yarn uses hadoop ugi, kubernetes uses a shared file 
mounted on all pods. These can be decoupled so it is possible to swap in 
implementation using public cloud. In the POC, this is implemented by passing 
around AWS KMS encrypted secret and delegate authentication and authorization 
to the cloud.
 * deployment & scheduler: adding a new cluster manager requires change a 
number of places in the Spark core package, and rebuilding the project. 
 * driver-executor communication: 
 * shuffle storage and retrieval: 

  was:
Public cloud providers have started offering serverless container services. For 
example, AWS offers Fargate [https://aws.amazon.com/fargate/]

This opens up the possibility to run Spark workloads in a serverless manner. 

Pluggable authentication

Pluggable scheduler

Pluggable shuffle storage and retrieval

Pluggable driver


> Serverless Spark in the Cloud
> -----------------------------
>
>                 Key: SPARK-27941
>                 URL: https://issues.apache.org/jira/browse/SPARK-27941
>             Project: Spark
>          Issue Type: New Feature
>          Components: Build, Deploy, Scheduler, Security, Shuffle, Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Shuheng Dai
>            Priority: Major
>
> Public cloud providers have started offering serverless container services. 
> For example, AWS offers Fargate [https://aws.amazon.com/fargate/]
> This opens up the possibility to run Spark workloads in a serverless manner 
> and remove the need to provision and maintain a cluster.
> While it might not make sense for Spark to favor any particular cloud 
> provider or to support a large number of cloud providers natively. It would 
> make sense to make some of the internal Spark components more pluggable and 
> cloud friendly so that it is easier for various cloud providers to integrate. 
> For example, 
>  * authentication: IO and network encryption requires authentication via 
> securely sharing a secret, and the implementation of this is currently tied 
> to the cluster manager: yarn uses hadoop ugi, kubernetes uses a shared file 
> mounted on all pods. These can be decoupled so it is possible to swap in 
> implementation using public cloud. In the POC, this is implemented by passing 
> around AWS KMS encrypted secret and delegate authentication and authorization 
> to the cloud.
>  * deployment & scheduler: adding a new cluster manager requires change a 
> number of places in the Spark core package, and rebuilding the project. 
>  * driver-executor communication: 
>  * shuffle storage and retrieval: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27941) Serverless Spark in the Cloud

Reply via email to