Re: Supporting Apache Aurora as a cluster manager

karthik padmanabhan Sat, 23 Sep 2017 22:21:04 -0700

Hi Mark,

Thanks for getting back. I think you raise a very valid point about getting
into a plug-in base architecture instead of supporting the idiosyncrasies
of different schedulers. Yeah let me write a design doc so that it will at
least be another data point for how we think about the plug-in architecture
discussed in SPARK-19700.


Thanks
Karthik

On Sun, Sep 10, 2017 at 11:02 PM, Mark Hamstra <m...@clearstorydata.com>
wrote:

> While it may be worth creating the design doc and JIRA ticket so that we
> at least have a better idea and a record of what you are talking about, I
> kind of doubt that we are going to want to merge this into the Spark
> codebase. That's not because of anything specific to this Aurora effort,
> but rather because scheduler implementations in general are not going in
> the preferred direction. There is already some regret that the YARN
> scheduler wasn't implemented by means of a scheduler plug-in API, and there
> is likely to be more regret if we continue to go forward with the
> spark-on-kubernetes SPIP in its present form. I'd guess that we are likely
> to merge code associated with that SPIP just because Kubernetes has become
> such an important resource scheduler, but such a merge wouldn't be without
> some misgivings. That is because we just can't get into the position of
> having more and more scheduler implementations in the Spark code, and more
> and more maintenance overhead to keep up with the idiosyncrasies of all the
> scheduler implementations. We've really got to get to the kind of plug-in
> architecture discussed in SPARK-19700 so that scheduler implementations can
> be done outside of the Spark codebase, release schedule, etc.
>
> My opinion on the subject isn't dispositive on its own, of course, but
> that is how I'm seeing things right now.
>
> On Sun, Sep 10, 2017 at 8:27 PM, karthik padmanabhan <
> treadston...@gmail.com> wrote:
>
>> Hi Spark Devs,
>>
>> We are using Aurora (http://aurora.apache.org/) as our mesos framework
>> for running stateless services. We would like to use Aurora to deploy big
>> data and batch workloads as well. And for this we have forked Spark and
>> implement the ExternalClusterManager trait.
>>
>> The reason for doing this and not running Spark on Mesos is to leverage
>> the existing roles and quotas provided by Aurora for admission control and
>> also leverage Aurora features such as priority and preemption. Additionally
>> we would like Aurora to be only deploy/orchestration system that our users
>> should interact with.
>>
>> We have a working POC where Spark is launching jobs through as the
>> ClusterManager. Is this something that can be merged upstream ? If so I can
>> create a design document and create an associated jira ticket.
>>
>> Thanks
>> Karthik
>>
>
>

Re: Supporting Apache Aurora as a cluster manager

Reply via email to