Re: Spark on Kubernetes scheduler variety

John Zhuge Thu, 24 Jun 2021 21:39:59 -0700

Thanks Yikun!

On Thu, Jun 24, 2021 at 8:54 PM Yikun Jiang <yikunk...@gmail.com> wrote:


> Hi, folks.
>
> As @Klaus mentioned, We have some work on Spark on k8s with volcano native
> support. Also, there were also some production deployment validation from
> our partners in China, like JingDong, XiaoHongShu, VIPshop.
>
> We will also prepare to propose an initial design and POC[3] on a shared
> branch (based on spark master branch) where we can collaborate on it, so I
> created the spark-volcano[1] org in github to make it happen.
>
> Pls feel free to comment on it [2] if you guys have any questions or
> concerns.
>
> [1] https://github.com/spark-volcano
> [2] https://github.com/spark-volcano/spark/issues/1
> [3] https://github.com/spark-volcano-wip/spark-3-volcano
>
> Regards,
> Yikun
>
> Holden Karau <hol...@pigscanfly.ca> 于2021年6月25日周五 上午12:00写道：
>
>> Hi Mich,
>>
>> I certainly think making Spark on Kubernetes run well is going to be a
>> challenge. However I think, and I could be wrong about this as well, that
>> in terms of cluster managers Kubernetes is likely to be our future. Talking
>> with people I don't hear about new standalone, YARN or mesos deployments of
>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>
>> To be clear I certainly agree that we need more work on structured
>> streaming, but its important to remember that the Spark developers are not
>> all fully interchangeable, we work on the things that we're interested in
>> pursuing so even if structured streaming needs more love if I'm not super
>> interested in structured streaming I'm less likely to work on it. That
>> being said I am certainly spinning up a bit more in the Spark SQL area
>> especially around our data source/connectors because I can see the need
>> there too.
>>
>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>>
>>>
>>> Please allow me to be diverse and express a different point of view on
>>> this roadmap.
>>>
>>>
>>> I believe from a technical point of view spending time and effort plus
>>> talent on batch scheduling on Kubernetes could be rewarding. However, if I
>>> may say I doubt whether such an approach and the so-called democratization
>>> of Spark on whatever platform is really should be of great focus.
>>>
>>> Having worked on Google Dataproc <https://cloud.google.com/dataproc> (A 
>>> fully
>>> managed and highly scalable service for running Apache Spark, Hadoop and
>>> more recently other artefacts) for that past two years, and Spark on
>>> Kubernetes on-premise, I have come to the conclusion that Spark is not a
>>> beast that that one can fully commoditize it much like one can do with
>>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas
>>> of Spark like Spark Structured Streaming (SSS) work seamlessly and
>>> effortlessly on these commercial platforms with whatever as a Service.
>>>
>>>
>>> Moreover, Spark (and I stand corrected) from the ground up has already a
>>> lot of resiliency and redundancy built in. It is truly an enterprise class
>>> product (requires enterprise class support) that will be difficult to
>>> commoditize with Kubernetes and expect the same performance. After all,
>>> Kubernetes is aimed at efficient resource sharing and potential cost saving
>>> for the mass market. In short I can see commercial enterprises will work on
>>> these platforms ,but may be the great talents on dev team should focus on
>>> stuff like the perceived limitation of SSS in dealing with chain of
>>> aggregation( if I am correct it is not yet supported on streaming datasets)
>>>
>>>
>>> These are my opinions and they are not facts, just opinions so to speak
>>> :)
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca> wrote:
>>>
>>>> I think these approaches are good, but there are limitations (eg
>>>> dynamic scaling) without us making changes inside of the Spark Kube
>>>> scheduler.
>>>>
>>>> Certainly whichever scheduler extensions we add support for we should
>>>> collaborate with the people developing those extensions insofar as they are
>>>> interested. My first place that I checked was #sig-scheduling which is
>>>> fairly quite on the Kubernetes slack but if there are more places to look
>>>> for folks interested in batch scheduling on Kubernetes we should definitely
>>>> give it a shot :)
>>>>
>>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>>> mich.talebza...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Regarding your point and I quote
>>>>>
>>>>> "..  I know that one of the Spark on Kube operators
>>>>> supports volcano/kube-batch so I was thinking that might be a place I 
>>>>> would
>>>>> start exploring..."
>>>>>
>>>>> There seems to be ongoing work on say Volcano as part of  Cloud
>>>>> Native Computing Foundation <https://cncf.io/> (CNCF). For example
>>>>> through https://github.com/volcano-sh/volcano
>>>>>
>>>> <https://github.com/volcano-sh/volcano>
>>>>>
>>>>> There may be value-add in collaborating with such groups through CNCF
>>>>> in order to have a collective approach to such work. There also seems to 
>>>>> be
>>>>> some work on Integration of Spark with Volcano for Batch Scheduling.
>>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md>
>>>>>
>>>>>
>>>>>
>>>>> What is not very clear is the degree of progress of these projects.
>>>>> You may be kind enough to elaborate on KPI for each of these projects and
>>>>> where you think your contributions is going to be.
>>>>>
>>>>>
>>>>> HTH,
>>>>>
>>>>>
>>>>> Mich
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca>
>>>>> wrote:
>>>>>
>>>>>> Hi Folks,
>>>>>>
>>>>>> I'm continuing my adventures to make Spark on containers party and I
>>>>>> was wondering if folks have experience with the different batch
>>>>>> scheduler options that they prefer? I was thinking so that we can
>>>>>> better support dynamic allocation it might make sense for us to
>>>>>> support using different schedulers and I wanted to see if there are
>>>>>> any that the community is more interested in?
>>>>>>
>>>>>> I know that one of the Spark on Kube operators supports
>>>>>> volcano/kube-batch so I was thinking that might be a place I start
>>>>>> exploring but also want to be open to other schedulers that folks
>>>>>> might be interested in.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Holden :)
>>>>>>
>>>>>> --
>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>> https://amzn.to/2MaRAG9
>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>
>>>>>> --
>>>> Twitter: https://twitter.com/holdenkarau
>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>
>>>
>>
>> --
>> Twitter: https://twitter.com/holdenkarau
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>
>

-- 
John Zhuge

Re: Spark on Kubernetes scheduler variety

Reply via email to