Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

Yikun Jiang Wed, 23 Feb 2022 19:35:32 -0800

First, much thanks for all your help (Spark/Volcano/Yunikorn community) to
make this SPIP happen!


Especially,@dongjoon-hyun @holdenk @william-wang @attilapiros @HyukjinKwon
@martin-g @yangwwei @tgravescs

The SPIP is near the end of the stage. It can be said that it is beta
available at the basic level.

I also draft a simple slide to show how to use and help you understand what
we have done:
https://docs.google.com/presentation/d/1XDsTWPcsBe4PQ-1MlBwd9pRl8mySdziE_dJE6iATNw8

Below are also some recap to help you understand current implementation and
next step on SPIP:

*# Existing work*
*## Basic part:*
- SPARK-36059 <https://issues.apache.org/jira/browse/SPARK-36059> *New
configuration:* ability to specify "schedulerName" in driver/executor for
Spark on K8S
- SPARK-37331 <https://issues.apache.org/jira/browse/SPARK-37331> *New
workflow：*ability to create pre-populated resources before driver pod  for
Spark on K8S
- SPARK-37145 <https://issues.apache.org/jira/browse/SPARK-37145> *New
developer API:* support user feature step with configuration for Spark on
K8S
- *(reviewing)* *New Job Configurations* for Spark on K8S:
  - SPARK-38188 <https://issues.apache.org/jira/browse/SPARK-38188>:
spark.kubernetes.job.queue
  - SPARK-38187 <https://issues.apache.org/jira/browse/SPARK-38187>:
spark.kubernetes.job.[minCPU|minMemory]
  - SPARK-38189 <https://issues.apache.org/jira/browse/SPARK-38189>:
spark.kubernetes.job.priorityClassName

*## Volcano Part:*
- SPARK-37258 <https://issues.apache.org/jira/browse/SPARK-37258> *New
volcano extension* in kubernetes-client fabric8io/kubernetes-client#3579
- SPARK-36061 <https://issues.apache.org/jira/browse/SPARK-36061> *New
profile: *-Pvolcano
- SPARK-36061 <https://issues.apache.org/jira/browse/SPARK-36061> *New
Feature Step:* VolcanoFeatureStep
- SPARK-36061 <https://issues.apache.org/jira/browse/SPARK-36061> *New
integration test:*
 *- Passed on x86 and Arm64 (Linux on Huawei Kunpeng 920 and MacOS on Apple
Silicon M1).*
 - Test basic volcano workflow
 - Test all existing tests based on the volcano.

*## Yunikorn Part:*
@yangwwei  will also make the efforts for Yunikorn module feature step
since this week.
I will help to complete the yunikorn integration based on previous
experience.

*# Next Plan*
There are also 3 main tasks to be completed before v3.3 code freeze:
1. (reviewing) SPARK-38188
<https://issues.apache.org/jira/browse/SPARK-38188>: Support queue
scheduling configuration
https://github.com/apache/spark/pull/35553
2. (reviewing) SPARK-38187
<https://issues.apache.org/jira/browse/SPARK-38187>: Support resource
reservation (minCPU/minMemory configuration)
https://github.com/apache/spark/pull/35640
3. (reviewing) SPARK-38187
<https://issues.apache.org/jira/browse/SPARK-38187>: Support priority
scheduling (priorityClass configuration):
https://issues.apache.org/jira/browse/SPARK-38189
https://github.com/apache/spark/pull/35639
4. (WIP) SPARK-37809 <https://issues.apache.org/jira/browse/SPARK-37809>:
Yunikorn integration

Also several misc work is gonna be completed before 3.3:
1. Integrated volcano deploy into integration test (x86 and arm)
- Add it to spark kubernetes integration test once cross compile support:
https://github.com/volcano-sh/volcano/pull/1571
2. Complete doc and test guideline.

Please feel free to contact me if you have any other concerns! Thanks!

[1] https://issues.apache.org/jira/browse/SPARK-36057

Recap on current status of "SPIP: Support Customized Kubernetes Schedulers"

Reply via email to