Re: Spark on Kubernetes scheduler variety

Klaus Ma Wed, 30 Jun 2021 19:16:59 -0700

Hi Mich,

Would you help to open an issue at spark-on-k8s-operator repo? We're going
to submit a PR to update the install steps :)


-- Klaus

On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh <[email protected]>
wrote:

> Hi Yikun
>
> In reference
>
>
> https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md
>
> Trying to install Volcano I am getting this error
>
> helm repo add incubator
> http://storage.googleapis.com/kubernetes-charts-incubator
> Error: looks like "
> http://storage.googleapis.com/kubernetes-charts-incubator"; is not a valid
> chart repository or cannot be reached: failed to fetch
> http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml :
> 404 Not Found
>
> Any ideas will be appreciated.
>
> Thanks,
>
> Mich
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh <[email protected]>
> wrote:
>
>> Cool, thanks!
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang <[email protected]> wrote:
>>
>>> > Is this the correct link for integrating Volcano with Spark?
>>>
>>> Yes, it is Kubernetes operator style of integrating Volcano. And if you
>>> want to just use spark submit style to submit a native support job, you can
>>> see [2] as ref.
>>>
>>> [1]
>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>
>>> Regards,
>>> Yikun
>>>
>>>
>>> Mich Talebzadeh <[email protected]> 于2021年6月28日周一 下午6:03写道：
>>>
>>>> Hi Yikun,
>>>>
>>>> Is this the correct link for integrating Volcano with Spark?
>>>>
>>>> spark-on-k8s-operator/volcano-integration.md at master ·
>>>> GoogleCloudPlatform/spark-on-k8s-operator · GitHub
>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md>
>>>>
>>>> Thanks
>>>>
>>>>
>>>> Mich
>>>>
>>>>
>>>>    view my Linkedin profile
>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>
>>>>
>>>>
>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>> any loss, damage or destruction of data or any other property which may
>>>> arise from relying on this email's technical content is explicitly
>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>> arising from such loss, damage or destruction.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang <[email protected]> wrote:
>>>>
>>>>> Oops, sorry for the error link, it should be:
>>>>>
>>>>> We will also prepare to propose an initial design and POC[3] on a
>>>>> shared branch (based on spark master branch) where we can collaborate on
>>>>> it, so I created the spark-volcano[1] org in github to make it happen.
>>>>>
>>>>> [3]
>>>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>>>
>>>>>
>>>>> And
>>>>> Regards,
>>>>> Yikun
>>>>>
>>>>>
>>>>> Yikun Jiang <[email protected]> 于2021年6月25日周五 上午11:53写道：
>>>>>
>>>>>> Hi, folks.
>>>>>>
>>>>>> As @Klaus mentioned, We have some work on Spark on k8s with volcano
>>>>>> native support. Also, there were also some production deployment 
>>>>>> validation
>>>>>> from our partners in China, like JingDong, XiaoHongShu, VIPshop.
>>>>>>
>>>>>> We will also prepare to propose an initial design and POC[3] on a
>>>>>> shared branch (based on spark master branch) where we can collaborate on
>>>>>> it, so I created the spark-volcano[1] org in github to make it happen.
>>>>>>
>>>>>> Pls feel free to comment on it [2] if you guys have any questions or
>>>>>> concerns.
>>>>>>
>>>>>> [1] https://github.com/spark-volcano
>>>>>> [2] https://github.com/spark-volcano/spark/issues/1
>>>>>> [3]
>>>>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>> Regards,
>>>>>> Yikun
>>>>>>
>>>>>> Holden Karau <[email protected]> 于2021年6月25日周五 上午12:00写道：
>>>>>>
>>>>>>> Hi Mich,
>>>>>>>
>>>>>>> I certainly think making Spark on Kubernetes run well is going to be
>>>>>>> a challenge. However I think, and I could be wrong about this as well, 
>>>>>>> that
>>>>>>> in terms of cluster managers Kubernetes is likely to be our future. 
>>>>>>> Talking
>>>>>>> with people I don't hear about new standalone, YARN or mesos 
>>>>>>> deployments of
>>>>>>> Spark, but I do hear about people trying to migrate to Kubernetes.
>>>>>>>
>>>>>>> To be clear I certainly agree that we need more work on structured
>>>>>>> streaming, but its important to remember that the Spark developers are 
>>>>>>> not
>>>>>>> all fully interchangeable, we work on the things that we're interested 
>>>>>>> in
>>>>>>> pursuing so even if structured streaming needs more love if I'm not 
>>>>>>> super
>>>>>>> interested in structured streaming I'm less likely to work on it. That
>>>>>>> being said I am certainly spinning up a bit more in the Spark SQL area
>>>>>>> especially around our data source/connectors because I can see the need
>>>>>>> there too.
>>>>>>>
>>>>>>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Please allow me to be diverse and express a different point of view
>>>>>>>> on this roadmap.
>>>>>>>>
>>>>>>>>
>>>>>>>> I believe from a technical point of view spending time and effort
>>>>>>>> plus talent on batch scheduling on Kubernetes could be rewarding. 
>>>>>>>> However,
>>>>>>>> if I may say I doubt whether such an approach and the so-called
>>>>>>>> democratization of Spark on whatever platform is really should be of 
>>>>>>>> great
>>>>>>>> focus.
>>>>>>>>
>>>>>>>> Having worked on Google Dataproc
>>>>>>>> <https://cloud.google.com/dataproc> (A fully managed and highly
>>>>>>>> scalable service for running Apache Spark, Hadoop and more recently 
>>>>>>>> other
>>>>>>>> artefacts) for that past two years, and Spark on Kubernetes
>>>>>>>> on-premise, I have come to the conclusion that Spark is not a beast 
>>>>>>>> that
>>>>>>>> that one can fully commoditize it much like one can do with  Zookeeper,
>>>>>>>> Kafka etc. There is always a struggle to make some niche areas of Spark
>>>>>>>> like Spark Structured Streaming (SSS) work seamlessly and effortlessly 
>>>>>>>> on
>>>>>>>> these commercial platforms with whatever as a Service.
>>>>>>>>
>>>>>>>>
>>>>>>>> Moreover, Spark (and I stand corrected) from the ground up has
>>>>>>>> already a lot of resiliency and redundancy built in. It is truly an
>>>>>>>> enterprise class product (requires enterprise class support) that will 
>>>>>>>> be
>>>>>>>> difficult to commoditize with Kubernetes and expect the same 
>>>>>>>> performance.
>>>>>>>> After all, Kubernetes is aimed at efficient resource sharing and 
>>>>>>>> potential
>>>>>>>> cost saving for the mass market. In short I can see commercial 
>>>>>>>> enterprises
>>>>>>>> will work on these platforms ,but may be the great talents on dev team
>>>>>>>> should focus on stuff like the perceived limitation of SSS in dealing 
>>>>>>>> with
>>>>>>>> chain of aggregation( if I am correct it is not yet supported on 
>>>>>>>> streaming
>>>>>>>> datasets)
>>>>>>>>
>>>>>>>>
>>>>>>>> These are my opinions and they are not facts, just opinions so to
>>>>>>>> speak :)
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I think these approaches are good, but there are limitations (eg
>>>>>>>>> dynamic scaling) without us making changes inside of the Spark Kube
>>>>>>>>> scheduler.
>>>>>>>>>
>>>>>>>>> Certainly whichever scheduler extensions we add support for we
>>>>>>>>> should collaborate with the people developing those extensions 
>>>>>>>>> insofar as
>>>>>>>>> they are interested. My first place that I checked was #sig-scheduling
>>>>>>>>> which is fairly quite on the Kubernetes slack but if there are more 
>>>>>>>>> places
>>>>>>>>> to look for folks interested in batch scheduling on Kubernetes we 
>>>>>>>>> should
>>>>>>>>> definitely give it a shot :)
>>>>>>>>>
>>>>>>>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Regarding your point and I quote
>>>>>>>>>>
>>>>>>>>>> "..  I know that one of the Spark on Kube operators
>>>>>>>>>> supports volcano/kube-batch so I was thinking that might be a place 
>>>>>>>>>> I would
>>>>>>>>>> start exploring..."
>>>>>>>>>>
>>>>>>>>>> There seems to be ongoing work on say Volcano as part of  Cloud
>>>>>>>>>> Native Computing Foundation <https://cncf.io/> (CNCF). For
>>>>>>>>>> example through https://github.com/volcano-sh/volcano
>>>>>>>>>>
>>>>>>>>> <https://github.com/volcano-sh/volcano>
>>>>>>>>>>
>>>>>>>>>> There may be value-add in collaborating with such groups through
>>>>>>>>>> CNCF in order to have a collective approach to such work. There also 
>>>>>>>>>> seems
>>>>>>>>>> to be some work on Integration of Spark with Volcano for Batch
>>>>>>>>>> Scheduling.
>>>>>>>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> What is not very clear is the degree of progress of these
>>>>>>>>>> projects. You may be kind enough to elaborate on KPI for each of 
>>>>>>>>>> these
>>>>>>>>>> projects and where you think your contributions is going to be.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> HTH,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Mich
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    view my Linkedin profile
>>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all
>>>>>>>>>> responsibility for any loss, damage or destruction of data or any 
>>>>>>>>>> other
>>>>>>>>>> property which may arise from relying on this email's technical 
>>>>>>>>>> content is
>>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any
>>>>>>>>>> monetary damages arising from such loss, damage or destruction.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Folks,
>>>>>>>>>>>
>>>>>>>>>>> I'm continuing my adventures to make Spark on containers party
>>>>>>>>>>> and I
>>>>>>>>>>> was wondering if folks have experience with the different batch
>>>>>>>>>>> scheduler options that they prefer? I was thinking so that we can
>>>>>>>>>>> better support dynamic allocation it might make sense for us to
>>>>>>>>>>> support using different schedulers and I wanted to see if there
>>>>>>>>>>> are
>>>>>>>>>>> any that the community is more interested in?
>>>>>>>>>>>
>>>>>>>>>>> I know that one of the Spark on Kube operators supports
>>>>>>>>>>> volcano/kube-batch so I was thinking that might be a place I
>>>>>>>>>>> start
>>>>>>>>>>> exploring but also want to be open to other schedulers that folks
>>>>>>>>>>> might be interested in.
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>>
>>>>>>>>>>> Holden :)
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>>>> https://amzn.to/2MaRAG9
>>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe e-mail: [email protected]
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Twitter: https://twitter.com/holdenkarau
>>>>>>> Books (Learning Spark, High Performance Spark, etc.):
>>>>>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>>>>>
>>>>>>

Re: Spark on Kubernetes scheduler variety

Reply via email to