Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

Holden Karau Wed, 23 Aug 2023 19:08:28 -0700

One option could be to initially launch both drivers and initial executors
(using the lazy executor ID allocation), but it would introduce a lot of
complexity.


On Wed, Aug 23, 2023 at 6:44 PM Qian Sun <qian.sun2...@gmail.com> wrote:

> Hi Mich
>
> I agree with your opinion that the startup time of the Spark on Kubernetes
> cluster needs to be improved.
>
> Regarding the fetching image directly, I have utilized ImageCache to store
> the images on the node, eliminating the time required to pull images from a
> remote repository, which does indeed lead to a reduction in overall time,
> and the effect becomes more pronounced as the size of the image increases.
>
>
> Additionally, I have observed that the driver pod takes a significant
> amount of time from running to attempting to create executor pods, with an
> estimated time expenditure of around 75%. We can also explore optimization
> options in this area.
>
> On Thu, Aug 24, 2023 at 12:58 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Hi all,
>>
>> On this conversion, one of the issues I brought up was the driver
>> start-up time. This is especially true in k8s. As spark on k8s is modeled
>> on Spark on standalone schedler, Spark on k8s consist of a single-driver
>> pod (as master on standalone”) and a  number of executors (“workers”). When 
>> executed
>> on k8s, the driver and executors are executed on separate pods
>> <https://spark.apache.org/docs/latest/running-on-kubernetes.html>. First
>> the driver pod is launched, then the driver pod itself launches the
>> executor pods. From my observation, in an auto scaling cluster, the driver
>> pod may take up to 40 seconds followed by executor pods. This is a
>> considerable time for customers and it is painfully slow. Can we actually
>> move away from dependency on standalone mode and try to speed up k8s
>> cluster formation.
>>
>> Another naive question, when the docker image is pulled from the
>> container registry to the driver itself, this takes finite time. The docker
>> image for executors could be different from that of the driver
>> docker image. Since spark-submit presents this at the time of submission,
>> can we save time by fetching the docker images straight away?
>>
>> Thanks
>>
>> Mich
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 8 Aug 2023 at 18:25, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Splendid idea. 👍
>>>
>>> Mich Talebzadeh,
>>> Solutions Architect/Engineering Lead
>>> London
>>> United Kingdom
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Tue, 8 Aug 2023 at 18:10, Holden Karau <hol...@pigscanfly.ca> wrote:
>>>
>>>> The driver it’s self is probably another topic, perhaps I’ll make a
>>>> “faster spark star time” JIRA and a DA JIRA and we can explore both.
>>>>
>>>> On Tue, Aug 8, 2023 at 10:07 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> From my own perspective faster execution time especially with Spark on
>>>>> tin boxes (Dataproc & EC2) and Spark on k8s is something that customers
>>>>> often bring up.
>>>>>
>>>>> Poor time to onboard with autoscaling seems to be particularly singled
>>>>> out for heavy ETL jobs that use Spark. I am disappointed to see the poor
>>>>> performance of Spark on k8s autopilot with timelines starting the driver
>>>>> itself and moving from Pending to Running phase (Spark 4.3.1 with Java 11)
>>>>>
>>>>> HTH
>>>>>
>>>>> Mich Talebzadeh,
>>>>> Solutions Architect/Engineering Lead
>>>>> London
>>>>> United Kingdom
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 8 Aug 2023 at 15:49, kalyan <justfors...@gmail.com> wrote:
>>>>>
>>>>>> +1 to enhancements in DEA. Long time due!
>>>>>>
>>>>>> There were a few things that I was thinking along the same lines for
>>>>>> some time now(few overlap with @holden 's points)
>>>>>> 1. How to reduce wastage on the RM side? Sometimes the driver asks
>>>>>> for some units of resources. But when RM provisions them, the driver
>>>>>> cancels it.
>>>>>> 2. How to make the resource available when it is needed.
>>>>>> 3. Cost Vs AppRunTime: A good DEA algo should allow the developer to
>>>>>> choose between cost and runtime. Sometimes developers might be ok to pay
>>>>>> higher costs for faster execution.
>>>>>> 4. Stitch resource profile choices into query execution.
>>>>>> 5. Allow different DEA algo to be chosen for different queries within
>>>>>> the same spark application.
>>>>>> 6. Fall back to default algo, when things go haywire!
>>>>>>
>>>>>> Model-based learning would be awesome.
>>>>>> These can be fine-tuned with some tools like sparklens.
>>>>>>
>>>>>> I am aware of a few experiments carried out in this area by
>>>>>> my friends in this domain. One lesson we had was, it is hard to have a
>>>>>> generic algorithm that worked for all cases.
>>>>>>
>>>>>> Regards
>>>>>> kalyan.
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 8, 2023 at 6:12 PM Mich Talebzadeh <
>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for pointing out this feature to me. I will have a look when
>>>>>>> I get there.
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>> Solutions Architect/Engineering Lead
>>>>>>> London
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, 8 Aug 2023 at 11:44, roryqi(齐赫) <ror...@tencent.com> wrote:
>>>>>>>
>>>>>>>> Spark 3.5 have added an method `supportsReliableStorage`  in the `
>>>>>>>> ShuffleDriverComponents` which indicate whether writing  shuffle
>>>>>>>> data to a distributed filesystem or persisting it in a remote shuffle
>>>>>>>> service.
>>>>>>>>
>>>>>>>> Uniffle is a general purpose remote shuffle service (
>>>>>>>> https://github.com/apache/incubator-uniffle).  It can enhance the
>>>>>>>> experience of Spark on K8S. After Spark 3.5 is released, Uniffle will
>>>>>>>> support the `ShuffleDriverComponents`.  you can see [1].
>>>>>>>>
>>>>>>>> If you have interest about more details of Uniffle, you can  see [2]
>>>>>>>>
>>>>>>>>
>>>>>>>> [1] https://github.com/apache/incubator-uniffle/issues/802.
>>>>>>>>
>>>>>>>> [2]
>>>>>>>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *发件人**: *Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>>>>> *日期**: *2023年8月8日 星期二 06:53
>>>>>>>> *抄送**: *dev <dev@spark.apache.org>
>>>>>>>> *主题**: *[Internet]Re: Improving Dynamic Allocation Logic for Spark
>>>>>>>> 4+
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On the subject of dynamic allocation, is the following message a
>>>>>>>> cause for concern when running Spark on k8s?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> INFO ExecutorAllocationManager: Dynamic allocation is enabled
>>>>>>>> without a shuffle service.
>>>>>>>>
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>>
>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>
>>>>>>>> London
>>>>>>>>
>>>>>>>> United Kingdom
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 7 Aug 2023 at 23:42, Mich Talebzadeh <
>>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From what I have seen spark on a serverless cluster has hard up
>>>>>>>> getting the driver going in a timely manner
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Annotations:  autopilot.gke.io/resource-adjustment:
>>>>>>>>
>>>>>>>>
>>>>>>>> {"input":{"containers":[{"limits":{"memory":"1433Mi"},"requests":{"cpu":"1","memory":"1433Mi"},"name":"spark-kubernetes-driver"}]},"output...
>>>>>>>>
>>>>>>>>               autopilot.gke.io/warden-version: 2.7.41
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> This is on spark 3.4.1 with Java 11 both the host running
>>>>>>>> spark-submit and the docker itself
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I am not sure how relevant this is to this discussion but it looks
>>>>>>>> like a kind of blocker for now. What config params can help here and 
>>>>>>>> what
>>>>>>>> can be done?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Mich Talebzadeh,
>>>>>>>>
>>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>>
>>>>>>>> London
>>>>>>>>
>>>>>>>> United Kingdom
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 7 Aug 2023 at 22:39, Holden Karau <hol...@pigscanfly.ca>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Oh great point
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 7, 2023 at 2:23 PM bo yang <bobyan...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Thanks Holden for bringing this up!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Maybe another thing to think about is how to make dynamic
>>>>>>>> allocation more friendly with Kubernetes and disaggregated shuffle 
>>>>>>>> storage?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Aug 7, 2023 at 1:27 PM Holden Karau <hol...@pigscanfly.ca>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> So I wondering if there is interesting in revisiting some of how
>>>>>>>> Spark is doing it's dynamica allocation for Spark 4+?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Some things that I've been thinking about:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> - Advisory user input (e.g. a way to say after X is done I know I
>>>>>>>> need Y where Y might be a bunch of GPU machines)
>>>>>>>>
>>>>>>>> - Configurable tolerance (e.g. if we have at most Z% over target
>>>>>>>> no-op)
>>>>>>>>
>>>>>>>> - Past runs of same job (e.g. stage X of job Y had a peak of K)
>>>>>>>>
>>>>>>>> - Faster executor launches (I'm a little fuzzy on what we can do
>>>>>>>> here but, one area for example is we setup and tear down an RPC 
>>>>>>>> connection
>>>>>>>> to the driver with a blocking call which does seem to have some locking
>>>>>>>> inside of the driver at first glance)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Is this an area other folks are thinking about? Should I make an
>>>>>>>> epic we can track ideas in? Or are folks generally happy with today's
>>>>>>>> dynamic allocation (or just busy with other things)?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>
>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>
>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>
>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>
>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>
>>>>>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>
>
> --
> Regards,
> Qian Sun
>
-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

Reply via email to