One option could be to initially launch both drivers and initial executors (using the lazy executor ID allocation), but it would introduce a lot of complexity.
On Wed, Aug 23, 2023 at 6:44 PM Qian Sun <qian.sun2...@gmail.com> wrote: > Hi Mich > > I agree with your opinion that the startup time of the Spark on Kubernetes > cluster needs to be improved. > > Regarding the fetching image directly, I have utilized ImageCache to store > the images on the node, eliminating the time required to pull images from a > remote repository, which does indeed lead to a reduction in overall time, > and the effect becomes more pronounced as the size of the image increases. > > > Additionally, I have observed that the driver pod takes a significant > amount of time from running to attempting to create executor pods, with an > estimated time expenditure of around 75%. We can also explore optimization > options in this area. > > On Thu, Aug 24, 2023 at 12:58 AM Mich Talebzadeh < > mich.talebza...@gmail.com> wrote: > >> Hi all, >> >> On this conversion, one of the issues I brought up was the driver >> start-up time. This is especially true in k8s. As spark on k8s is modeled >> on Spark on standalone schedler, Spark on k8s consist of a single-driver >> pod (as master on standalone”) and a number of executors (“workers”). When >> executed >> on k8s, the driver and executors are executed on separate pods >> <https://spark.apache.org/docs/latest/running-on-kubernetes.html>. First >> the driver pod is launched, then the driver pod itself launches the >> executor pods. From my observation, in an auto scaling cluster, the driver >> pod may take up to 40 seconds followed by executor pods. This is a >> considerable time for customers and it is painfully slow. Can we actually >> move away from dependency on standalone mode and try to speed up k8s >> cluster formation. >> >> Another naive question, when the docker image is pulled from the >> container registry to the driver itself, this takes finite time. The docker >> image for executors could be different from that of the driver >> docker image. Since spark-submit presents this at the time of submission, >> can we save time by fetching the docker images straight away? >> >> Thanks >> >> Mich >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Tue, 8 Aug 2023 at 18:25, Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> >>> Splendid idea. 👍 >>> >>> Mich Talebzadeh, >>> Solutions Architect/Engineering Lead >>> London >>> United Kingdom >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> https://en.everybodywiki.com/Mich_Talebzadeh >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Tue, 8 Aug 2023 at 18:10, Holden Karau <hol...@pigscanfly.ca> wrote: >>> >>>> The driver it’s self is probably another topic, perhaps I’ll make a >>>> “faster spark star time” JIRA and a DA JIRA and we can explore both. >>>> >>>> On Tue, Aug 8, 2023 at 10:07 AM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> From my own perspective faster execution time especially with Spark on >>>>> tin boxes (Dataproc & EC2) and Spark on k8s is something that customers >>>>> often bring up. >>>>> >>>>> Poor time to onboard with autoscaling seems to be particularly singled >>>>> out for heavy ETL jobs that use Spark. I am disappointed to see the poor >>>>> performance of Spark on k8s autopilot with timelines starting the driver >>>>> itself and moving from Pending to Running phase (Spark 4.3.1 with Java 11) >>>>> >>>>> HTH >>>>> >>>>> Mich Talebzadeh, >>>>> Solutions Architect/Engineering Lead >>>>> London >>>>> United Kingdom >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, 8 Aug 2023 at 15:49, kalyan <justfors...@gmail.com> wrote: >>>>> >>>>>> +1 to enhancements in DEA. Long time due! >>>>>> >>>>>> There were a few things that I was thinking along the same lines for >>>>>> some time now(few overlap with @holden 's points) >>>>>> 1. How to reduce wastage on the RM side? Sometimes the driver asks >>>>>> for some units of resources. But when RM provisions them, the driver >>>>>> cancels it. >>>>>> 2. How to make the resource available when it is needed. >>>>>> 3. Cost Vs AppRunTime: A good DEA algo should allow the developer to >>>>>> choose between cost and runtime. Sometimes developers might be ok to pay >>>>>> higher costs for faster execution. >>>>>> 4. Stitch resource profile choices into query execution. >>>>>> 5. Allow different DEA algo to be chosen for different queries within >>>>>> the same spark application. >>>>>> 6. Fall back to default algo, when things go haywire! >>>>>> >>>>>> Model-based learning would be awesome. >>>>>> These can be fine-tuned with some tools like sparklens. >>>>>> >>>>>> I am aware of a few experiments carried out in this area by >>>>>> my friends in this domain. One lesson we had was, it is hard to have a >>>>>> generic algorithm that worked for all cases. >>>>>> >>>>>> Regards >>>>>> kalyan. >>>>>> >>>>>> >>>>>> On Tue, Aug 8, 2023 at 6:12 PM Mich Talebzadeh < >>>>>> mich.talebza...@gmail.com> wrote: >>>>>> >>>>>>> Thanks for pointing out this feature to me. I will have a look when >>>>>>> I get there. >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> Solutions Architect/Engineering Lead >>>>>>> London >>>>>>> United Kingdom >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages >>>>>>> arising from such loss, damage or destruction. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, 8 Aug 2023 at 11:44, roryqi(齐赫) <ror...@tencent.com> wrote: >>>>>>> >>>>>>>> Spark 3.5 have added an method `supportsReliableStorage` in the ` >>>>>>>> ShuffleDriverComponents` which indicate whether writing shuffle >>>>>>>> data to a distributed filesystem or persisting it in a remote shuffle >>>>>>>> service. >>>>>>>> >>>>>>>> Uniffle is a general purpose remote shuffle service ( >>>>>>>> https://github.com/apache/incubator-uniffle). It can enhance the >>>>>>>> experience of Spark on K8S. After Spark 3.5 is released, Uniffle will >>>>>>>> support the `ShuffleDriverComponents`. you can see [1]. >>>>>>>> >>>>>>>> If you have interest about more details of Uniffle, you can see [2] >>>>>>>> >>>>>>>> >>>>>>>> [1] https://github.com/apache/incubator-uniffle/issues/802. >>>>>>>> >>>>>>>> [2] >>>>>>>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *发件人**: *Mich Talebzadeh <mich.talebza...@gmail.com> >>>>>>>> *日期**: *2023年8月8日 星期二 06:53 >>>>>>>> *抄送**: *dev <dev@spark.apache.org> >>>>>>>> *主题**: *[Internet]Re: Improving Dynamic Allocation Logic for Spark >>>>>>>> 4+ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On the subject of dynamic allocation, is the following message a >>>>>>>> cause for concern when running Spark on k8s? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> INFO ExecutorAllocationManager: Dynamic allocation is enabled >>>>>>>> without a shuffle service. >>>>>>>> >>>>>>>> >>>>>>>> Mich Talebzadeh, >>>>>>>> >>>>>>>> Solutions Architect/Engineering Lead >>>>>>>> >>>>>>>> London >>>>>>>> >>>>>>>> United Kingdom >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> view my Linkedin profile >>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>>> for any loss, damage or destruction of data or any other property >>>>>>>> which may >>>>>>>> arise from relying on this email's technical content is explicitly >>>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>>> damages >>>>>>>> arising from such loss, damage or destruction. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 7 Aug 2023 at 23:42, Mich Talebzadeh < >>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> From what I have seen spark on a serverless cluster has hard up >>>>>>>> getting the driver going in a timely manner >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Annotations: autopilot.gke.io/resource-adjustment: >>>>>>>> >>>>>>>> >>>>>>>> {"input":{"containers":[{"limits":{"memory":"1433Mi"},"requests":{"cpu":"1","memory":"1433Mi"},"name":"spark-kubernetes-driver"}]},"output... >>>>>>>> >>>>>>>> autopilot.gke.io/warden-version: 2.7.41 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> This is on spark 3.4.1 with Java 11 both the host running >>>>>>>> spark-submit and the docker itself >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> I am not sure how relevant this is to this discussion but it looks >>>>>>>> like a kind of blocker for now. What config params can help here and >>>>>>>> what >>>>>>>> can be done? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Thanks >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Mich Talebzadeh, >>>>>>>> >>>>>>>> Solutions Architect/Engineering Lead >>>>>>>> >>>>>>>> London >>>>>>>> >>>>>>>> United Kingdom >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> view my Linkedin profile >>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>>> for any loss, damage or destruction of data or any other property >>>>>>>> which may >>>>>>>> arise from relying on this email's technical content is explicitly >>>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>>> damages >>>>>>>> arising from such loss, damage or destruction. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, 7 Aug 2023 at 22:39, Holden Karau <hol...@pigscanfly.ca> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Oh great point >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 7, 2023 at 2:23 PM bo yang <bobyan...@gmail.com> wrote: >>>>>>>> >>>>>>>> Thanks Holden for bringing this up! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Maybe another thing to think about is how to make dynamic >>>>>>>> allocation more friendly with Kubernetes and disaggregated shuffle >>>>>>>> storage? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Aug 7, 2023 at 1:27 PM Holden Karau <hol...@pigscanfly.ca> >>>>>>>> wrote: >>>>>>>> >>>>>>>> So I wondering if there is interesting in revisiting some of how >>>>>>>> Spark is doing it's dynamica allocation for Spark 4+? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Some things that I've been thinking about: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> - Advisory user input (e.g. a way to say after X is done I know I >>>>>>>> need Y where Y might be a bunch of GPU machines) >>>>>>>> >>>>>>>> - Configurable tolerance (e.g. if we have at most Z% over target >>>>>>>> no-op) >>>>>>>> >>>>>>>> - Past runs of same job (e.g. stage X of job Y had a peak of K) >>>>>>>> >>>>>>>> - Faster executor launches (I'm a little fuzzy on what we can do >>>>>>>> here but, one area for example is we setup and tear down an RPC >>>>>>>> connection >>>>>>>> to the driver with a blocking call which does seem to have some locking >>>>>>>> inside of the driver at first glance) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Is this an area other folks are thinking about? Should I make an >>>>>>>> epic we can track ideas in? Or are folks generally happy with today's >>>>>>>> dynamic allocation (or just busy with other things)? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> >>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>> >>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>> >>>>>>>> -- >>>>>>>> >>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>> >>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>> >>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>> >>>>>>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>> > > -- > Regards, > Qian Sun > -- Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> YouTube Live Streams: https://www.youtube.com/user/holdenkarau