Re: Spark on Kubernetes Builder Pattern Design Document

Matt Cheah Mon, 05 Feb 2018 13:57:46 -0800

I think in this case, the original design that was proposed before the document 
was implemented on the Spark on K8s fork, that we took some time to build 
separately before proposing that the fork be merged into the main line.


 

Specifically, the timeline of events was:

 
We started building Spark on Kubernetes on a fork and was prepared to merge our 
work directly into master,
Discussion on https://issues.apache.org/jira/browse/SPARK-18278 led us to move 
down the path of working on a fork first. We would harden the fork, have the 
fork become used more widely to prove its value and robustness in practice. See 
https://github.com/apache-spark-on-k8s/spark
On said fork, we made the original design decisions to use a step-based builder 
pattern for the driver but not the same design for the executors. This original 
discussion was made among the collaborators of the fork, as much of the work on 
the fork in general was not done on the mailing list.
We eventually decided to merge the fork into the main line, and got the 
feedback in the corresponding PRs.
 

Therefore the question may less so be with this specific design, but whether or 
not the overarching approach we took - building Spark on K8s on a fork first 
before merging into mainline – was the correct one in the first place. There’s 
also the issue that the work done on the fork was isolated from the dev mailing 
list. Moving forward as we push our work into mainline Spark, we aim to be 
transparent with the Spark community via the Spark mailing list and Spark JIRA 
tickets. We’re specifically aiming to deprecate the fork and migrate all the 
work done on the fork into the main line.

 

-Matt Cheah

 

From: Mark Hamstra <m...@clearstorydata.com>
Date: Monday, February 5, 2018 at 1:44 PM
To: Matt Cheah <mch...@palantir.com>
Cc: "dev@spark.apache.org" <dev@spark.apache.org>, "ramanath...@google.com" 
<ramanath...@google.com>, Ilan Filonenko <i...@cornell.edu>, Erik 
<e...@redhat.com>, Marcelo Vanzin <van...@cloudera.com>
Subject: Re: Spark on Kubernetes Builder Pattern Design Document

 

That's good, but you should probably stop and consider whether the discussions 
that led up to this document's creation could have taken place on this dev list 
-- because if they could have, then they probably should have as part of the 
whole spark-on-k8s project becoming part of mainline spark development, not a 
separate fork. 

 

On Mon, Feb 5, 2018 at 1:17 PM, Matt Cheah <mch...@palantir.com> wrote:

Hi everyone,

 

While we were building the Spark on Kubernetes integration, we realized that 
some of the abstractions we introduced for building the driver application in 
spark-submit, and building executor pods in the scheduler backend, could be 
improved for better readability and clarity. We received feedback in this pull 
request[github.com] in particular. In response to this feedback, we’ve put 
together a design document that proposes a possible refactor to address the 
given feedback.

 

You may comment on the proposed design at this link: 
https://docs.google.com/document/d/1XPLh3E2JJ7yeJSDLZWXh_lUcjZ1P0dy9QeUEyxIlfak/edit#[docs.google.com]

 

I hope that we can have a productive discussion and continue improving the 
Kubernetes integration further.

 

Thanks,

 

-Matt Cheah

smime.p7s
Description: S/MIME cryptographic signature

Re: Spark on Kubernetes Builder Pattern Design Document

Reply via email to