@Holden Karau <hol...@pigscanfly.ca> Thanks for reminder, I will send the vote mail soon.
and thanks for all helps on discussion and design review. Regards, Yikun Holden Karau <hol...@pigscanfly.ca> 于2022年1月6日周四 03:16写道: > Do we want to move the SPIP forward to a vote? It seems like we're mostly > agreeing in principle? > > On Wed, Jan 5, 2022 at 11:12 AM Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Hi Bo, >> >> Thanks for the info. Let me elaborate: >> >> In theory you can set the number of executors to multiple values of >> Nodes. For example if you have a three node k8s cluster (in my case Google >> GKE), you can set the number of executors to 6 and end up with six >> executors queuing to start but ultimately you finish with two running >> executors plus the driver in a 3 node cluster as shown below >> >> hduser@ctpvm: /home/hduser> k get pods -n spark >> >> NAME READY STATUS RESTARTS >> AGE >> >> *randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 >> 33s* >> >> *randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 >> 33s* >> >> randomdatabigquery-d42d067e2b91c88a-exec-3 0/1 Pending 0 >> 33s >> >> randomdatabigquery-d42d067e2b91c88a-exec-4 0/1 Pending 0 >> 33s >> >> randomdatabigquery-d42d067e2b91c88a-exec-5 0/1 Pending 0 >> 33s >> >> randomdatabigquery-d42d067e2b91c88a-exec-6 0/1 Pending 0 >> 33s >> >> *sparkbq-0beda77e2b919e01-driver 1/1 Running 0 >> 45s* >> >> hduser@ctpvm: /home/hduser> k get pods -n spark >> >> NAME READY STATUS RESTARTS >> AGE >> >> randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 >> 38s >> >> randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 >> 38s >> >> sparkbq-0beda77e2b919e01-driver 1/1 Running 0 >> 50s >> >> hduser@ctpvm: /home/hduser> k get pods -n spark >> >> *NAME READY STATUS RESTARTS >> AGE* >> >> *randomdatabigquery-d42d067e2b91c88a-exec-1 1/1 Running 0 >> 40s* >> >> *randomdatabigquery-d42d067e2b91c88a-exec-2 1/1 Running 0 >> 40s* >> >> *sparkbq-0beda77e2b919e01-driver 1/1 Running 0 >> 52s* >> >> So you end up with the three added executors dropping out. Hence the >> conclusion seems to be you want to fit exactly one Spark executor pod >> per Kubernetes node with the current model. >> >> HTH >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Wed, 5 Jan 2022 at 17:01, bo yang <bobyan...@gmail.com> wrote: >> >>> Hi Mich, >>> >>> Curious what do you mean “The constraint seems to be that you can fit one >>> Spark executor pod per Kubernetes node and from my tests you don't seem to >>> be able to allocate more than 50% of RAM on the node to the container", >>> Would you help to explain a bit? Asking this because there could be >>> multiple executor pods running on a single Kuberentes node. >>> >>> Thanks, >>> Bo >>> >>> >>> On Wed, Jan 5, 2022 at 1:13 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> Thanks William for the info. >>>> >>>> >>>> >>>> >>>> >>>> The current model of Spark on k8s has certain drawbacks with pod based >>>> scheduling as I tested it on Google Kubernetes Cluster (GKE). The >>>> constraint seems to be that you can fit one Spark executor pod per >>>> Kubernetes node and from my tests you don't seem to be able to allocate >>>> more than 50% of RAM on the node to the container. >>>> >>>> >>>> [image: gke_memoeyPlot.png] >>>> >>>> >>>> Anymore results in the container never been created (stuck at pending) >>>> >>>> kubectl describe pod sparkbq-b506ac7dc521b667-driver -n spark >>>> >>>> Events: >>>> >>>> Type Reason Age From >>>> Message >>>> >>>> ---- ------ ---- ---- >>>> ------- >>>> >>>> Warning FailedScheduling 17m default-scheduler >>>> 0/3 nodes are available: 3 Insufficient memory. >>>> >>>> Warning FailedScheduling 17m default-scheduler >>>> 0/3 nodes are available: 3 Insufficient memory. >>>> >>>> Normal NotTriggerScaleUp 2m28s (x92 over 17m) cluster-autoscaler >>>> pod didn't trigger scale-up: >>>> >>>> Obviously this is far from ideal and this model although works is not >>>> efficient. >>>> >>>> >>>> Cheers, >>>> >>>> >>>> Mich >>>> >>>> >>>> >>>> >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction >>>> >>>> of data or any other property which may arise from relying on this >>>> email's technical content is explicitly disclaimed. >>>> >>>> The author will in no case be liable for any monetary damages arising >>>> from such >>>> >>>> loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Wed, 5 Jan 2022 at 03:55, William Wang <wang.platf...@gmail.com> >>>> wrote: >>>> >>>>> Hi Mich, >>>>> >>>>> Here are parts of performance indications in Volcano. >>>>> 1. Scheduler throughput: 1.5k pod/s (default scheduler: 100 Pod/s) >>>>> 2. Spark application performance improved 30%+ with minimal resource >>>>> reservation feature in case of insufficient resource.(tested with TPC-DS) >>>>> >>>>> We are still working on more optimizations. Besides the performance, >>>>> Volcano is continuously enhanced in below four directions to provide >>>>> abilities that users care about. >>>>> - Full lifecycle management for jobs >>>>> - Scheduling policies for high-performance workloads(fair-share, >>>>> topology, sla, reservation, preemption, backfill etc) >>>>> - Support for heterogeneous hardware >>>>> - Performance optimization for high-performance workloads >>>>> >>>>> Thanks >>>>> LeiBo >>>>> >>>>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2022年1月4日周二 18:12写道: >>>>> >>>> Interesting,thanks >>>>>> >>>>>> Do you have any indication of the ballpark figure (a rough numerical >>>>>> estimate) of adding Volcano as an alternative scheduler is going to >>>>>> improve Spark on k8s performance? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> view my Linkedin profile >>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>> for any loss, damage or destruction >>>>>> >>>>>> of data or any other property which may arise from relying on this >>>>>> email's technical content is explicitly disclaimed. >>>>>> >>>>>> The author will in no case be liable for any monetary damages arising >>>>>> from such >>>>>> >>>>>> loss, damage or destruction. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, 4 Jan 2022 at 09:43, Yikun Jiang <yikunk...@gmail.com> wrote: >>>>>> >>>>>>> Hi, folks! Wishing you all the best in 2022. >>>>>>> >>>>>>> I'd like to share the current status on "Support Customized K8S >>>>>>> Scheduler in Spark". >>>>>>> >>>>>>> >>>>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg/edit#heading=h.1quyr1r2kr5n >>>>>>> >>>>>>> Framework/Common support >>>>>>> >>>>>>> - Volcano and Yunikorn team join the discussion and complete the >>>>>>> initial doc on framework/common part. >>>>>>> >>>>>>> - SPARK-37145 <https://issues.apache.org/jira/browse/SPARK-37145> >>>>>>> (under reviewing): We proposed to extend the customized scheduler by >>>>>>> just >>>>>>> using a custom feature step, it will meet the requirement of customized >>>>>>> scheduler after it gets merged. After this, the user can enable >>>>>>> featurestep >>>>>>> and scheduler like: >>>>>>> >>>>>>> spark-submit \ >>>>>>> >>>>>>> --conf spark.kubernete.scheduler.name volcano \ >>>>>>> >>>>>>> --conf spark.kubernetes.driver.pod.featureSteps >>>>>>> org.apache.spark.deploy.k8s.features.scheduler.VolcanoFeatureStep >>>>>>> >>>>>>> --conf spark.kubernete.job.queue xxx >>>>>>> >>>>>>> (such as above, the VolcanoFeatureStep will help to set the the >>>>>>> spark scheduler queue according user specified conf) >>>>>>> >>>>>>> - SPARK-37331 <https://issues.apache.org/jira/browse/SPARK-37331>: >>>>>>> Added the ability to create kubernetes resources before driver pod >>>>>>> creation. >>>>>>> >>>>>>> - SPARK-36059 <https://issues.apache.org/jira/browse/SPARK-36059>: >>>>>>> Add the ability to specify a scheduler in driver/executor >>>>>>> >>>>>>> After above all, the framework/common support would be ready for >>>>>>> most of customized schedulers >>>>>>> >>>>>>> Volcano part: >>>>>>> >>>>>>> - SPARK-37258 <https://issues.apache.org/jira/browse/SPARK-37258>: >>>>>>> Upgrade kubernetes-client to 5.11.1 to add volcano scheduler API >>>>>>> support. >>>>>>> >>>>>>> - SPARK-36061 <https://issues.apache.org/jira/browse/SPARK-36061>: >>>>>>> Add a VolcanoFeatureStep to help users to create a PodGroup with user >>>>>>> specified minimum resources required, there is also a WIP commit to >>>>>>> show the preview of this >>>>>>> <https://github.com/Yikun/spark/pull/45/commits/81bf6f98edb5c00ebd0662dc172bc73f980b6a34> >>>>>>> . >>>>>>> >>>>>>> Yunikorn part: >>>>>>> >>>>>>> - @WeiweiYang is completing the doc of the Yunikorn part and >>>>>>> implementing the Yunikorn part. >>>>>>> >>>>>>> Regards, >>>>>>> Yikun >>>>>>> >>>>>>> >>>>>>> Weiwei Yang <w...@apache.org> 于2021年12月2日周四 02:00写道: >>>>>>> >>>>>>>> Thank you Yikun for the info, and thanks for inviting me to a >>>>>>>> meeting to discuss this. >>>>>>>> I appreciate your effort to put these together, and I agree that >>>>>>>> the purpose is to make Spark easy/flexible enough to support other K8s >>>>>>>> schedulers (not just for Volcano). >>>>>>>> As discussed, could you please help to abstract out the things in >>>>>>>> common and allow Spark to plug different implementations? I'd be happy >>>>>>>> to >>>>>>>> work with you guys on this issue. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Nov 30, 2021 at 6:49 PM Yikun Jiang <yikunk...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> @Weiwei @Chenya >>>>>>>>> >>>>>>>>> > Thanks for bringing this up. This is quite interesting, we >>>>>>>>> definitely should participate more in the discussions. >>>>>>>>> >>>>>>>>> Thanks for your reply and welcome to join the discussion, I think >>>>>>>>> the input from Yunikorn is very critical. >>>>>>>>> >>>>>>>>> > The main thing here is, the Spark community should make Spark >>>>>>>>> pluggable in order to support other schedulers, not just for Volcano. >>>>>>>>> It >>>>>>>>> looks like this proposal is pushing really hard for adopting PodGroup, >>>>>>>>> which isn't part of K8s yet, that to me is problematic. >>>>>>>>> >>>>>>>>> Definitely yes, we are on the same page. >>>>>>>>> >>>>>>>>> I think we have the same goal: propose a general and reasonable >>>>>>>>> mechanism to make spark on k8s with a custom scheduler more usable. >>>>>>>>> >>>>>>>>> But for the PodGroup, just allow me to do a brief introduction: >>>>>>>>> - The PodGroup definition has been approved by Kubernetes >>>>>>>>> officially in KEP-583. [1] >>>>>>>>> - It can be regarded as a general concept/standard in Kubernetes >>>>>>>>> rather than a specific concept in Volcano, there are also others to >>>>>>>>> implement it, such as [2][3]. >>>>>>>>> - Kubernetes recommends using CRD to do more extension to >>>>>>>>> implement what they want. [4] >>>>>>>>> - Volcano as extension provides an interface to maintain the life >>>>>>>>> cycle PodGroup CRD and use volcano-scheduler to complete the >>>>>>>>> scheduling. >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/583-coscheduling >>>>>>>>> [2] >>>>>>>>> https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/coscheduling#podgroup >>>>>>>>> [3] https://github.com/kubernetes-sigs/kube-batch >>>>>>>>> [4] >>>>>>>>> https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/ >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Yikun >>>>>>>>> >>>>>>>>> >>>>>>>>> Weiwei Yang <w...@apache.org> 于2021年12月1日周三 上午5:57写道: >>>>>>>>> >>>>>>>>>> Hi Chenya >>>>>>>>>> >>>>>>>>>> Thanks for bringing this up. This is quite interesting, we >>>>>>>>>> definitely should participate more in the discussions. >>>>>>>>>> The main thing here is, the Spark community should make Spark >>>>>>>>>> pluggable in order to support other schedulers, not just for >>>>>>>>>> Volcano. It >>>>>>>>>> looks like this proposal is pushing really hard for adopting >>>>>>>>>> PodGroup, >>>>>>>>>> which isn't part of K8s yet, that to me is problematic. >>>>>>>>>> >>>>>>>>>> On Tue, Nov 30, 2021 at 9:21 AM Prasad Paravatha < >>>>>>>>>> prasad.parava...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> This is a great feature/idea. >>>>>>>>>>> I'd love to get involved in some form (testing and/or >>>>>>>>>>> documentation). This could be my 1st contribution to Spark! >>>>>>>>>>> >>>>>>>>>>> On Tue, Nov 30, 2021 at 10:46 PM John Zhuge <jzh...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> +1 Kudos to Yikun and the community for starting the discussion! >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Nov 30, 2021 at 8:47 AM Chenya Zhang < >>>>>>>>>>>> chenyazhangche...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Thanks folks for bringing up the topic of natively integrating >>>>>>>>>>>>> Volcano and other alternative schedulers into Spark! >>>>>>>>>>>>> >>>>>>>>>>>>> +Weiwei, Wilfred, Chaoran. We would love to contribute to the >>>>>>>>>>>>> discussion as well. >>>>>>>>>>>>> >>>>>>>>>>>>> From our side, we have been using and improving on one >>>>>>>>>>>>> alternative resource scheduler, Apache YuniKorn ( >>>>>>>>>>>>> https://yunikorn.apache.org/), for Spark on Kubernetes in >>>>>>>>>>>>> production at Apple with solid results in the past year. It is >>>>>>>>>>>>> capable of >>>>>>>>>>>>> supporting Gang scheduling (similar to PodGroups), multi-tenant >>>>>>>>>>>>> resource >>>>>>>>>>>>> queues (similar to YARN), FIFO, and other handy features like bin >>>>>>>>>>>>> packing >>>>>>>>>>>>> to enable efficient autoscaling, etc. >>>>>>>>>>>>> >>>>>>>>>>>>> Natively integrating with Spark would provide more flexibility >>>>>>>>>>>>> for users and reduce the extra cost and potential inconsistency of >>>>>>>>>>>>> maintaining different layers of resource strategies. One >>>>>>>>>>>>> interesting topic >>>>>>>>>>>>> we hope to discuss more about is dynamic allocation, which would >>>>>>>>>>>>> benefit >>>>>>>>>>>>> from native coordination between Spark and resource schedulers in >>>>>>>>>>>>> K8s & >>>>>>>>>>>>> cloud environment for an optimal resource efficiency. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Nov 30, 2021 at 8:10 AM Holden Karau < >>>>>>>>>>>>> hol...@pigscanfly.ca> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for putting this together, I’m really excited for us >>>>>>>>>>>>>> to add better batch scheduling integrations. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Nov 30, 2021 at 12:46 AM Yikun Jiang < >>>>>>>>>>>>>> yikunk...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey everyone, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I'd like to start a discussion on "Support >>>>>>>>>>>>>>> Volcano/Alternative Schedulers Proposal". >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This SPIP is proposed to make spark k8s schedulers provide >>>>>>>>>>>>>>> more YARN like features (such as queues and minimum resources >>>>>>>>>>>>>>> before >>>>>>>>>>>>>>> scheduling jobs) that many folks want on Kubernetes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The goal of this SPIP is to improve current spark k8s >>>>>>>>>>>>>>> scheduler implementations, add the ability of batch scheduling >>>>>>>>>>>>>>> and support >>>>>>>>>>>>>>> volcano as one of implementations. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Design doc: >>>>>>>>>>>>>>> https://docs.google.com/document/d/1xgQGRpaHQX6-QH_J9YV2C2Dh6RpXefUpLM7KGkzL6Fg >>>>>>>>>>>>>>> JIRA: https://issues.apache.org/jira/browse/SPARK-36057 >>>>>>>>>>>>>>> Part of PRs: >>>>>>>>>>>>>>> Ability to create resources >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/34599 >>>>>>>>>>>>>>> Add PodGroupFeatureStep: >>>>>>>>>>>>>>> https://github.com/apache/spark/pull/34456 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Yikun >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>>>>>>> YouTube Live Streams: >>>>>>>>>>>>>> https://www.youtube.com/user/holdenkarau >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> John Zhuge >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Regards, >>>>>>>>>>> Prasad Paravatha >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> > > -- > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau >