While it may be worth creating the design doc and JIRA ticket so that we at least have a better idea and a record of what you are talking about, I kind of doubt that we are going to want to merge this into the Spark codebase. That's not because of anything specific to this Aurora effort, but rather because scheduler implementations in general are not going in the preferred direction. There is already some regret that the YARN scheduler wasn't implemented by means of a scheduler plug-in API, and there is likely to be more regret if we continue to go forward with the spark-on-kubernetes SPIP in its present form. I'd guess that we are likely to merge code associated with that SPIP just because Kubernetes has become such an important resource scheduler, but such a merge wouldn't be without some misgivings. That is because we just can't get into the position of having more and more scheduler implementations in the Spark code, and more and more maintenance overhead to keep up with the idiosyncrasies of all the scheduler implementations. We've really got to get to the kind of plug-in architecture discussed in SPARK-19700 so that scheduler implementations can be done outside of the Spark codebase, release schedule, etc.
My opinion on the subject isn't dispositive on its own, of course, but that is how I'm seeing things right now. On Sun, Sep 10, 2017 at 8:27 PM, karthik padmanabhan <treadston...@gmail.com > wrote: > Hi Spark Devs, > > We are using Aurora (http://aurora.apache.org/) as our mesos framework > for running stateless services. We would like to use Aurora to deploy big > data and batch workloads as well. And for this we have forked Spark and > implement the ExternalClusterManager trait. > > The reason for doing this and not running Spark on Mesos is to leverage > the existing roles and quotas provided by Aurora for admission control and > also leverage Aurora features such as priority and preemption. Additionally > we would like Aurora to be only deploy/orchestration system that our users > should interact with. > > We have a working POC where Spark is launching jobs through as the > ClusterManager. Is this something that can be merged upstream ? If so I can > create a design document and create an associated jira ticket. > > Thanks > Karthik >