Hi Mich I agree with your opinion that the startup time of the Spark on Kubernetes cluster needs to be improved.
Regarding the fetching image directly, I have utilized ImageCache to store the images on the node, eliminating the time required to pull images from a remote repository, which does indeed lead to a reduction in overall time, and the effect becomes more pronounced as the size of the image increases. Additionally, I have observed that the driver pod takes a significant amount of time from running to attempting to create executor pods, with an estimated time expenditure of around 75%. We can also explore optimization options in this area. On Thu, Aug 24, 2023 at 12:58 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi all, > > On this conversion, one of the issues I brought up was the driver start-up > time. This is especially true in k8s. As spark on k8s is modeled on Spark > on standalone schedler, Spark on k8s consist of a single-driver pod (as > master on standalone”) and a number of executors (“workers”). When executed > on k8s, the driver and executors are executed on separate pods > <https://spark.apache.org/docs/latest/running-on-kubernetes.html>. First > the driver pod is launched, then the driver pod itself launches the > executor pods. From my observation, in an auto scaling cluster, the driver > pod may take up to 40 seconds followed by executor pods. This is a > considerable time for customers and it is painfully slow. Can we actually > move away from dependency on standalone mode and try to speed up k8s > cluster formation. > > Another naive question, when the docker image is pulled from the container > registry to the driver itself, this takes finite time. The docker image for > executors could be different from that of the driver docker image. Since > spark-submit presents this at the time of submission, can we save time by > fetching the docker images straight away? > > Thanks > > Mich > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 8 Aug 2023 at 18:25, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Splendid idea. 👍 >> >> Mich Talebzadeh, >> Solutions Architect/Engineering Lead >> London >> United Kingdom >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Tue, 8 Aug 2023 at 18:10, Holden Karau <hol...@pigscanfly.ca> wrote: >> >>> The driver it’s self is probably another topic, perhaps I’ll make a >>> “faster spark star time” JIRA and a DA JIRA and we can explore both. >>> >>> On Tue, Aug 8, 2023 at 10:07 AM Mich Talebzadeh < >>> mich.talebza...@gmail.com> wrote: >>> >>>> From my own perspective faster execution time especially with Spark on >>>> tin boxes (Dataproc & EC2) and Spark on k8s is something that customers >>>> often bring up. >>>> >>>> Poor time to onboard with autoscaling seems to be particularly singled >>>> out for heavy ETL jobs that use Spark. I am disappointed to see the poor >>>> performance of Spark on k8s autopilot with timelines starting the driver >>>> itself and moving from Pending to Running phase (Spark 4.3.1 with Java 11) >>>> >>>> HTH >>>> >>>> Mich Talebzadeh, >>>> Solutions Architect/Engineering Lead >>>> London >>>> United Kingdom >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Tue, 8 Aug 2023 at 15:49, kalyan <justfors...@gmail.com> wrote: >>>> >>>>> +1 to enhancements in DEA. Long time due! >>>>> >>>>> There were a few things that I was thinking along the same lines for >>>>> some time now(few overlap with @holden 's points) >>>>> 1. How to reduce wastage on the RM side? Sometimes the driver asks for >>>>> some units of resources. But when RM provisions them, the driver cancels >>>>> it. >>>>> 2. How to make the resource available when it is needed. >>>>> 3. Cost Vs AppRunTime: A good DEA algo should allow the developer to >>>>> choose between cost and runtime. Sometimes developers might be ok to pay >>>>> higher costs for faster execution. >>>>> 4. Stitch resource profile choices into query execution. >>>>> 5. Allow different DEA algo to be chosen for different queries within >>>>> the same spark application. >>>>> 6. Fall back to default algo, when things go haywire! >>>>> >>>>> Model-based learning would be awesome. >>>>> These can be fine-tuned with some tools like sparklens. >>>>> >>>>> I am aware of a few experiments carried out in this area by my friends >>>>> in this domain. One lesson we had was, it is hard to have a generic >>>>> algorithm that worked for all cases. >>>>> >>>>> Regards >>>>> kalyan. >>>>> >>>>> >>>>> On Tue, Aug 8, 2023 at 6:12 PM Mich Talebzadeh < >>>>> mich.talebza...@gmail.com> wrote: >>>>> >>>>>> Thanks for pointing out this feature to me. I will have a look when I >>>>>> get there. >>>>>> >>>>>> Mich Talebzadeh, >>>>>> Solutions Architect/Engineering Lead >>>>>> London >>>>>> United Kingdom >>>>>> >>>>>> >>>>>> view my Linkedin profile >>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>> >>>>>> >>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>> >>>>>> >>>>>> >>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>> for any loss, damage or destruction of data or any other property which >>>>>> may >>>>>> arise from relying on this email's technical content is explicitly >>>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>>> arising from such loss, damage or destruction. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Tue, 8 Aug 2023 at 11:44, roryqi(齐赫) <ror...@tencent.com> wrote: >>>>>> >>>>>>> Spark 3.5 have added an method `supportsReliableStorage` in the ` >>>>>>> ShuffleDriverComponents` which indicate whether writing shuffle >>>>>>> data to a distributed filesystem or persisting it in a remote shuffle >>>>>>> service. >>>>>>> >>>>>>> Uniffle is a general purpose remote shuffle service ( >>>>>>> https://github.com/apache/incubator-uniffle). It can enhance the >>>>>>> experience of Spark on K8S. After Spark 3.5 is released, Uniffle will >>>>>>> support the `ShuffleDriverComponents`. you can see [1]. >>>>>>> >>>>>>> If you have interest about more details of Uniffle, you can see [2] >>>>>>> >>>>>>> >>>>>>> [1] https://github.com/apache/incubator-uniffle/issues/802. >>>>>>> >>>>>>> [2] >>>>>>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era >>>>>>> >>>>>>> >>>>>>> >>>>>>> *发件人**: *Mich Talebzadeh <mich.talebza...@gmail.com> >>>>>>> *日期**: *2023年8月8日 星期二 06:53 >>>>>>> *抄送**: *dev <dev@spark.apache.org> >>>>>>> *主题**: *[Internet]Re: Improving Dynamic Allocation Logic for Spark >>>>>>> 4+ >>>>>>> >>>>>>> >>>>>>> >>>>>>> On the subject of dynamic allocation, is the following message a >>>>>>> cause for concern when running Spark on k8s? >>>>>>> >>>>>>> >>>>>>> >>>>>>> INFO ExecutorAllocationManager: Dynamic allocation is enabled >>>>>>> without a shuffle service. >>>>>>> >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> >>>>>>> Solutions Architect/Engineering Lead >>>>>>> >>>>>>> London >>>>>>> >>>>>>> United Kingdom >>>>>>> >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages >>>>>>> arising from such loss, damage or destruction. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, 7 Aug 2023 at 23:42, Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> >>>>>>> >>>>>>> From what I have seen spark on a serverless cluster has hard up >>>>>>> getting the driver going in a timely manner >>>>>>> >>>>>>> >>>>>>> >>>>>>> Annotations: autopilot.gke.io/resource-adjustment: >>>>>>> >>>>>>> >>>>>>> {"input":{"containers":[{"limits":{"memory":"1433Mi"},"requests":{"cpu":"1","memory":"1433Mi"},"name":"spark-kubernetes-driver"}]},"output... >>>>>>> >>>>>>> autopilot.gke.io/warden-version: 2.7.41 >>>>>>> >>>>>>> >>>>>>> >>>>>>> This is on spark 3.4.1 with Java 11 both the host running >>>>>>> spark-submit and the docker itself >>>>>>> >>>>>>> >>>>>>> >>>>>>> I am not sure how relevant this is to this discussion but it looks >>>>>>> like a kind of blocker for now. What config params can help here and >>>>>>> what >>>>>>> can be done? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> >>>>>>> >>>>>>> >>>>>>> Mich Talebzadeh, >>>>>>> >>>>>>> Solutions Architect/Engineering Lead >>>>>>> >>>>>>> London >>>>>>> >>>>>>> United Kingdom >>>>>>> >>>>>>> >>>>>>> >>>>>>> view my Linkedin profile >>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>> >>>>>>> >>>>>>> >>>>>>> https://en.everybodywiki.com/Mich_Talebzadeh >>>>>>> >>>>>>> >>>>>>> >>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>> for any loss, damage or destruction of data or any other property which >>>>>>> may >>>>>>> arise from relying on this email's technical content is explicitly >>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>> damages >>>>>>> arising from such loss, damage or destruction. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, 7 Aug 2023 at 22:39, Holden Karau <hol...@pigscanfly.ca> >>>>>>> wrote: >>>>>>> >>>>>>> Oh great point >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 7, 2023 at 2:23 PM bo yang <bobyan...@gmail.com> wrote: >>>>>>> >>>>>>> Thanks Holden for bringing this up! >>>>>>> >>>>>>> >>>>>>> >>>>>>> Maybe another thing to think about is how to make dynamic allocation >>>>>>> more friendly with Kubernetes and disaggregated shuffle storage? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 7, 2023 at 1:27 PM Holden Karau <hol...@pigscanfly.ca> >>>>>>> wrote: >>>>>>> >>>>>>> So I wondering if there is interesting in revisiting some of how >>>>>>> Spark is doing it's dynamica allocation for Spark 4+? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Some things that I've been thinking about: >>>>>>> >>>>>>> >>>>>>> >>>>>>> - Advisory user input (e.g. a way to say after X is done I know I >>>>>>> need Y where Y might be a bunch of GPU machines) >>>>>>> >>>>>>> - Configurable tolerance (e.g. if we have at most Z% over target >>>>>>> no-op) >>>>>>> >>>>>>> - Past runs of same job (e.g. stage X of job Y had a peak of K) >>>>>>> >>>>>>> - Faster executor launches (I'm a little fuzzy on what we can do >>>>>>> here but, one area for example is we setup and tear down an RPC >>>>>>> connection >>>>>>> to the driver with a blocking call which does seem to have some locking >>>>>>> inside of the driver at first glance) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Is this an area other folks are thinking about? Should I make an >>>>>>> epic we can track ideas in? Or are folks generally happy with today's >>>>>>> dynamic allocation (or just busy with other things)? >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>>> -- >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> >> -- Regards, Qian Sun