[
https://issues.apache.org/jira/browse/SPARK-25678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659756#comment-16659756
]
Ilan Filonenko commented on SPARK-25678:
----------------------------------------
Would recommend to look at: https://jira.apache.org/jira/browse/SPARK-19700 as
this seems to be related to your approach towards making Spark enable pluggable
scheduler implementations.
> SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)
> ------------------------------------------------------------------------
>
> Key: SPARK-25678
> URL: https://issues.apache.org/jira/browse/SPARK-25678
> Project: Spark
> Issue Type: New Feature
> Components: Scheduler
> Affects Versions: 3.0.0
> Reporter: Utkarsh Maheshwari
> Priority: Major
>
> I sent an email on the dev mailing list but got no response, hence filing a
> JIRA ticket.
>
> PBS (Portable Batch System) Professional is an open sourced workload
> management system for HPC clusters. Many organizations using PBS for managing
> their cluster also use Spark for Big Data but they are forced to divide the
> cluster into Spark cluster and PBS cluster either physically dividing the
> cluster nodes into two groups or starting Spark Standalone cluster manager's
> Master and Slaves as PBS jobs, leading to underutilization of resources.
>
> I am trying to add support in Spark to use PBS as a pluggable cluster
> manager. Going through the Spark codebase and looking at Mesos and Kubernetes
> integration, I found that we can get this working as follows:
>
> - Extend `ExternalClusterManager`.
> - Extend `CoarseGrainedSchedulerBackend`
> - This class can start `Executors` as PBS jobs.
> - The initial number of `Executors` are started `onStart`.
> - More `Executors` can be started as and when required using
> `doRequestTotalExecutors`.
> - `Executors` can be killed using `doKillExecutors`.
> - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy
> mode.
> - This extended class can submit the Spark application again as a PBS job
> which with deploy mode = client, so that the application driver is started on
> a node in the cluster.
>
> I have a couple of questions:
> - Does this seem like a good idea to do this or should we look at other
> options?
> - What are the expectations from the initial prototype?
> - If this works, would Spark maintainers look forward to merging this or
> would they want it to be maintained as a fork?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]