Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

Qian Sun Wed, 23 Aug 2023 18:45:08 -0700

Hi Mich

I agree with your opinion that the startup time of the Spark on Kubernetes
cluster needs to be improved.


Regarding the fetching image directly, I have utilized ImageCache to store
the images on the node, eliminating the time required to pull images from a
remote repository, which does indeed lead to a reduction in overall time,
and the effect becomes more pronounced as the size of the image increases.

Additionally, I have observed that the driver pod takes a significant
amount of time from running to attempting to create executor pods, with an
estimated time expenditure of around 75%. We can also explore optimization
options in this area.

On Thu, Aug 24, 2023 at 12:58 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi all,
>
> On this conversion, one of the issues I brought up was the driver start-up
> time. This is especially true in k8s. As spark on k8s is modeled on Spark
> on standalone schedler, Spark on k8s consist of a single-driver pod (as
> master on standalone”) and a  number of executors (“workers”). When executed
> on k8s, the driver and executors are executed on separate pods
> <https://spark.apache.org/docs/latest/running-on-kubernetes.html>. First
> the driver pod is launched, then the driver pod itself launches the
> executor pods. From my observation, in an auto scaling cluster, the driver
> pod may take up to 40 seconds followed by executor pods. This is a
> considerable time for customers and it is painfully slow. Can we actually
> move away from dependency on standalone mode and try to speed up k8s
> cluster formation.
>
> Another naive question, when the docker image is pulled from the container
> registry to the driver itself, this takes finite time. The docker image for
> executors could be different from that of the driver docker image. Since
> spark-submit presents this at the time of submission, can we save time by
> fetching the docker images straight away?
>
> Thanks
>
> Mich
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 8 Aug 2023 at 18:25, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> Splendid idea. 👍
>>
>> Mich Talebzadeh,
>> Solutions Architect/Engineering Lead
>> London
>> United Kingdom
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 8 Aug 2023 at 18:10, Holden Karau <hol...@pigscanfly.ca> wrote:
>>
>>> The driver it’s self is probably another topic, perhaps I’ll make a
>>> “faster spark star time” JIRA and a DA JIRA and we can explore both.
>>>
>>> On Tue, Aug 8, 2023 at 10:07 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
>>>> From my own perspective faster execution time especially with Spark on
>>>> tin boxes (Dataproc & EC2) and Spark on k8s is something that customers
>>>> often bring up.
>>>>
>>>> Poor time to onboard with autoscaling seems to be particularly singled
>>>> out for heavy ETL jobs that use Spark. I am disappointed to see the poor
>>>> performance of Spark on k8s autopilot with timelines starting the driver
>>>> itself and moving from Pending to Running phase (Spark 4.3.1 with Java 11)
>>>>
>>>> HTH
>>>>
>>>> Mich Talebzadeh,
>>>> Solutions Architect/Engineering Lead
>>>> London
>>>> United Kingdom
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, 8 Aug 2023 at 15:49, kalyan <justfors...@gmail.com> wrote:
>>>>
>>>>> +1 to enhancements in DEA. Long time due!
>>>>>
>>>>> There were a few things that I was thinking along the same lines for
>>>>> some time now(few overlap with @holden 's points)
>>>>> 1. How to reduce wastage on the RM side? Sometimes the driver asks for
>>>>> some units of resources. But when RM provisions them, the driver cancels
>>>>> it.
>>>>> 2. How to make the resource available when it is needed.
>>>>> 3. Cost Vs AppRunTime: A good DEA algo should allow the developer to
>>>>> choose between cost and runtime. Sometimes developers might be ok to pay
>>>>> higher costs for faster execution.
>>>>> 4. Stitch resource profile choices into query execution.
>>>>> 5. Allow different DEA algo to be chosen for different queries within
>>>>> the same spark application.
>>>>> 6. Fall back to default algo, when things go haywire!
>>>>>
>>>>> Model-based learning would be awesome.
>>>>> These can be fine-tuned with some tools like sparklens.
>>>>>
>>>>> I am aware of a few experiments carried out in this area by my friends
>>>>> in this domain. One lesson we had was, it is hard to have a generic
>>>>> algorithm that worked for all cases.
>>>>>
>>>>> Regards
>>>>> kalyan.
>>>>>
>>>>>
>>>>> On Tue, Aug 8, 2023 at 6:12 PM Mich Talebzadeh <
>>>>> mich.talebza...@gmail.com> wrote:
>>>>>
>>>>>> Thanks for pointing out this feature to me. I will have a look when I
>>>>>> get there.
>>>>>>
>>>>>> Mich Talebzadeh,
>>>>>> Solutions Architect/Engineering Lead
>>>>>> London
>>>>>> United Kingdom
>>>>>>
>>>>>>
>>>>>>    view my Linkedin profile
>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>
>>>>>>
>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>> may
>>>>>> arise from relying on this email's technical content is explicitly
>>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>>> arising from such loss, damage or destruction.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 8 Aug 2023 at 11:44, roryqi(齐赫) <ror...@tencent.com> wrote:
>>>>>>
>>>>>>> Spark 3.5 have added an method `supportsReliableStorage`  in the `
>>>>>>> ShuffleDriverComponents` which indicate whether writing  shuffle
>>>>>>> data to a distributed filesystem or persisting it in a remote shuffle
>>>>>>> service.
>>>>>>>
>>>>>>> Uniffle is a general purpose remote shuffle service (
>>>>>>> https://github.com/apache/incubator-uniffle).  It can enhance the
>>>>>>> experience of Spark on K8S. After Spark 3.5 is released, Uniffle will
>>>>>>> support the `ShuffleDriverComponents`.  you can see [1].
>>>>>>>
>>>>>>> If you have interest about more details of Uniffle, you can  see [2]
>>>>>>>
>>>>>>>
>>>>>>> [1] https://github.com/apache/incubator-uniffle/issues/802.
>>>>>>>
>>>>>>> [2]
>>>>>>> https://uniffle.apache.org/blog/2023/07/21/Uniffle%20-%20New%20chapter%20for%20the%20shuffle%20in%20the%20cloud%20native%20era
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *发件人**: *Mich Talebzadeh <mich.talebza...@gmail.com>
>>>>>>> *日期**: *2023年8月8日 星期二 06:53
>>>>>>> *抄送**: *dev <dev@spark.apache.org>
>>>>>>> *主题**: *[Internet]Re: Improving Dynamic Allocation Logic for Spark
>>>>>>> 4+
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On the subject of dynamic allocation, is the following message a
>>>>>>> cause for concern when running Spark on k8s?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> INFO ExecutorAllocationManager: Dynamic allocation is enabled
>>>>>>> without a shuffle service.
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>>
>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>
>>>>>>> London
>>>>>>>
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 7 Aug 2023 at 23:42, Mich Talebzadeh <
>>>>>>> mich.talebza...@gmail.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From what I have seen spark on a serverless cluster has hard up
>>>>>>> getting the driver going in a timely manner
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Annotations:  autopilot.gke.io/resource-adjustment:
>>>>>>>
>>>>>>>
>>>>>>> {"input":{"containers":[{"limits":{"memory":"1433Mi"},"requests":{"cpu":"1","memory":"1433Mi"},"name":"spark-kubernetes-driver"}]},"output...
>>>>>>>
>>>>>>>               autopilot.gke.io/warden-version: 2.7.41
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> This is on spark 3.4.1 with Java 11 both the host running
>>>>>>> spark-submit and the docker itself
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I am not sure how relevant this is to this discussion but it looks
>>>>>>> like a kind of blocker for now. What config params can help here and 
>>>>>>> what
>>>>>>> can be done?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mich Talebzadeh,
>>>>>>>
>>>>>>> Solutions Architect/Engineering Lead
>>>>>>>
>>>>>>> London
>>>>>>>
>>>>>>> United Kingdom
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    view my Linkedin profile
>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>> for any loss, damage or destruction of data or any other property which 
>>>>>>> may
>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>> damages
>>>>>>> arising from such loss, damage or destruction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, 7 Aug 2023 at 22:39, Holden Karau <hol...@pigscanfly.ca>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Oh great point
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 7, 2023 at 2:23 PM bo yang <bobyan...@gmail.com> wrote:
>>>>>>>
>>>>>>> Thanks Holden for bringing this up!
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Maybe another thing to think about is how to make dynamic allocation
>>>>>>> more friendly with Kubernetes and disaggregated shuffle storage?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 7, 2023 at 1:27 PM Holden Karau <hol...@pigscanfly.ca>
>>>>>>> wrote:
>>>>>>>
>>>>>>> So I wondering if there is interesting in revisiting some of how
>>>>>>> Spark is doing it's dynamica allocation for Spark 4+?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Some things that I've been thinking about:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> - Advisory user input (e.g. a way to say after X is done I know I
>>>>>>> need Y where Y might be a bunch of GPU machines)
>>>>>>>
>>>>>>> - Configurable tolerance (e.g. if we have at most Z% over target
>>>>>>> no-op)
>>>>>>>
>>>>>>> - Past runs of same job (e.g. stage X of job Y had a peak of K)
>>>>>>>
>>>>>>> - Faster executor launches (I'm a little fuzzy on what we can do
>>>>>>> here but, one area for example is we setup and tear down an RPC 
>>>>>>> connection
>>>>>>> to the driver with a blocking call which does seem to have some locking
>>>>>>> inside of the driver at first glance)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Is this an area other folks are thinking about? Should I make an
>>>>>>> epic we can track ideas in? Or are folks generally happy with today's
>>>>>>> dynamic allocation (or just busy with other things)?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>> --
>>> Twitter: https://twitter.com/holdenkarau
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>
>>

-- 
Regards,
Qian Sun

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

Reply via email to