[jira] [Commented] (SPARK-25678) SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)

Ilan Filonenko (JIRA) Mon, 22 Oct 2018 15:14:32 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-25678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659756#comment-16659756
 ]


Ilan Filonenko commented on SPARK-25678:
----------------------------------------

Would recommend to look at: https://jira.apache.org/jira/browse/SPARK-19700 as 
this seems to be related to your approach towards making Spark enable pluggable 
scheduler implementations.  

> SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)
> ------------------------------------------------------------------------
>
>                 Key: SPARK-25678
>                 URL: https://issues.apache.org/jira/browse/SPARK-25678
>             Project: Spark
>          Issue Type: New Feature
>          Components: Scheduler
>    Affects Versions: 3.0.0
>            Reporter: Utkarsh Maheshwari
>            Priority: Major
>
> I sent an email on the dev mailing list but got no response, hence filing a 
> JIRA ticket.
>  
> PBS (Portable Batch System) Professional is an open sourced workload 
> management system for HPC clusters. Many organizations using PBS for managing 
> their cluster also use Spark for Big Data but they are forced to divide the 
> cluster into Spark cluster and PBS cluster either physically dividing the 
> cluster nodes into two groups or starting Spark Standalone cluster manager's 
> Master and Slaves as PBS jobs, leading to underutilization of resources.
>  
>  I am trying to add support in Spark to use PBS as a pluggable cluster 
> manager. Going through the Spark codebase and looking at Mesos and Kubernetes 
> integration, I found that we can get this working as follows:
>  
>  - Extend `ExternalClusterManager`.
>  - Extend `CoarseGrainedSchedulerBackend`
>    - This class can start `Executors` as PBS jobs.
>    - The initial number of `Executors` are started `onStart`.
>    - More `Executors` can be started as and when required using 
> `doRequestTotalExecutors`.
>    - `Executors` can be killed using `doKillExecutors`.
>  - Extend `SparkApplication` to start `Driver` as a PBS job in cluster deploy 
> mode.
>    - This extended class can submit the Spark application again as a PBS job 
> which with deploy mode = client, so that the application driver is started on 
> a node in the cluster.
>  
>  I have a couple of questions:
>  - Does this seem like a good idea to do this or should we look at other 
> options?
>  - What are the expectations from the initial prototype?
>  - If this works, would Spark maintainers look forward to merging this or 
> would they want it to be maintained as a fork?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-25678) SPIP: Adding support in Spark for HPC cluster manager (PBS Professional)

Reply via email to