Hi Mich, Would you help to open an issue at spark-on-k8s-operator repo? We're going to submit a PR to update the install steps :)
-- Klaus On Wed, Jun 30, 2021 at 12:24 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi Yikun > > In reference > > > https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md > > Trying to install Volcano I am getting this error > > helm repo add incubator > http://storage.googleapis.com/kubernetes-charts-incubator > Error: looks like " > http://storage.googleapis.com/kubernetes-charts-incubator" is not a valid > chart repository or cannot be reached: failed to fetch > http://storage.googleapis.com/kubernetes-charts-incubator/index.yaml : > 404 Not Found > > Any ideas will be appreciated. > > Thanks, > > Mich > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 29 Jun 2021 at 09:14, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> Cool, thanks! >> >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Tue, 29 Jun 2021 at 07:33, Yikun Jiang <yikunk...@gmail.com> wrote: >> >>> > Is this the correct link for integrating Volcano with Spark? >>> >>> Yes, it is Kubernetes operator style of integrating Volcano. And if you >>> want to just use spark submit style to submit a native support job, you can >>> see [2] as ref. >>> >>> [1] >>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 >>> >>> Regards, >>> Yikun >>> >>> >>> Mich Talebzadeh <mich.talebza...@gmail.com> 于2021年6月28日周一 下午6:03写道: >>> >>>> Hi Yikun, >>>> >>>> Is this the correct link for integrating Volcano with Spark? >>>> >>>> spark-on-k8s-operator/volcano-integration.md at master · >>>> GoogleCloudPlatform/spark-on-k8s-operator · GitHub >>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>>> >>>> Thanks >>>> >>>> >>>> Mich >>>> >>>> >>>> view my Linkedin profile >>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>> >>>> >>>> >>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>> any loss, damage or destruction of data or any other property which may >>>> arise from relying on this email's technical content is explicitly >>>> disclaimed. The author will in no case be liable for any monetary damages >>>> arising from such loss, damage or destruction. >>>> >>>> >>>> >>>> >>>> On Fri, 25 Jun 2021 at 09:45, Yikun Jiang <yikunk...@gmail.com> wrote: >>>> >>>>> Oops, sorry for the error link, it should be: >>>>> >>>>> We will also prepare to propose an initial design and POC[3] on a >>>>> shared branch (based on spark master branch) where we can collaborate on >>>>> it, so I created the spark-volcano[1] org in github to make it happen. >>>>> >>>>> [3] >>>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 >>>>> >>>>> >>>>> And >>>>> Regards, >>>>> Yikun >>>>> >>>>> >>>>> Yikun Jiang <yikunk...@gmail.com> 于2021年6月25日周五 上午11:53写道: >>>>> >>>>>> Hi, folks. >>>>>> >>>>>> As @Klaus mentioned, We have some work on Spark on k8s with volcano >>>>>> native support. Also, there were also some production deployment >>>>>> validation >>>>>> from our partners in China, like JingDong, XiaoHongShu, VIPshop. >>>>>> >>>>>> We will also prepare to propose an initial design and POC[3] on a >>>>>> shared branch (based on spark master branch) where we can collaborate on >>>>>> it, so I created the spark-volcano[1] org in github to make it happen. >>>>>> >>>>>> Pls feel free to comment on it [2] if you guys have any questions or >>>>>> concerns. >>>>>> >>>>>> [1] https://github.com/spark-volcano >>>>>> [2] https://github.com/spark-volcano/spark/issues/1 >>>>>> [3] >>>>>> https://github.com/huawei-cloudnative/spark/commit/6c1f37525f026353eaead34216d47dad653f13a4 >>>>>> >>>>>> >>>>> >>>>> >>>>>> Regards, >>>>>> Yikun >>>>>> >>>>>> Holden Karau <hol...@pigscanfly.ca> 于2021年6月25日周五 上午12:00写道: >>>>>> >>>>>>> Hi Mich, >>>>>>> >>>>>>> I certainly think making Spark on Kubernetes run well is going to be >>>>>>> a challenge. However I think, and I could be wrong about this as well, >>>>>>> that >>>>>>> in terms of cluster managers Kubernetes is likely to be our future. >>>>>>> Talking >>>>>>> with people I don't hear about new standalone, YARN or mesos >>>>>>> deployments of >>>>>>> Spark, but I do hear about people trying to migrate to Kubernetes. >>>>>>> >>>>>>> To be clear I certainly agree that we need more work on structured >>>>>>> streaming, but its important to remember that the Spark developers are >>>>>>> not >>>>>>> all fully interchangeable, we work on the things that we're interested >>>>>>> in >>>>>>> pursuing so even if structured streaming needs more love if I'm not >>>>>>> super >>>>>>> interested in structured streaming I'm less likely to work on it. That >>>>>>> being said I am certainly spinning up a bit more in the Spark SQL area >>>>>>> especially around our data source/connectors because I can see the need >>>>>>> there too. >>>>>>> >>>>>>> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh < >>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Please allow me to be diverse and express a different point of view >>>>>>>> on this roadmap. >>>>>>>> >>>>>>>> >>>>>>>> I believe from a technical point of view spending time and effort >>>>>>>> plus talent on batch scheduling on Kubernetes could be rewarding. >>>>>>>> However, >>>>>>>> if I may say I doubt whether such an approach and the so-called >>>>>>>> democratization of Spark on whatever platform is really should be of >>>>>>>> great >>>>>>>> focus. >>>>>>>> >>>>>>>> Having worked on Google Dataproc >>>>>>>> <https://cloud.google.com/dataproc> (A fully managed and highly >>>>>>>> scalable service for running Apache Spark, Hadoop and more recently >>>>>>>> other >>>>>>>> artefacts) for that past two years, and Spark on Kubernetes >>>>>>>> on-premise, I have come to the conclusion that Spark is not a beast >>>>>>>> that >>>>>>>> that one can fully commoditize it much like one can do with Zookeeper, >>>>>>>> Kafka etc. There is always a struggle to make some niche areas of Spark >>>>>>>> like Spark Structured Streaming (SSS) work seamlessly and effortlessly >>>>>>>> on >>>>>>>> these commercial platforms with whatever as a Service. >>>>>>>> >>>>>>>> >>>>>>>> Moreover, Spark (and I stand corrected) from the ground up has >>>>>>>> already a lot of resiliency and redundancy built in. It is truly an >>>>>>>> enterprise class product (requires enterprise class support) that will >>>>>>>> be >>>>>>>> difficult to commoditize with Kubernetes and expect the same >>>>>>>> performance. >>>>>>>> After all, Kubernetes is aimed at efficient resource sharing and >>>>>>>> potential >>>>>>>> cost saving for the mass market. In short I can see commercial >>>>>>>> enterprises >>>>>>>> will work on these platforms ,but may be the great talents on dev team >>>>>>>> should focus on stuff like the perceived limitation of SSS in dealing >>>>>>>> with >>>>>>>> chain of aggregation( if I am correct it is not yet supported on >>>>>>>> streaming >>>>>>>> datasets) >>>>>>>> >>>>>>>> >>>>>>>> These are my opinions and they are not facts, just opinions so to >>>>>>>> speak :) >>>>>>>> >>>>>>>> >>>>>>>> view my Linkedin profile >>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility >>>>>>>> for any loss, damage or destruction of data or any other property >>>>>>>> which may >>>>>>>> arise from relying on this email's technical content is explicitly >>>>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>>>> damages >>>>>>>> arising from such loss, damage or destruction. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I think these approaches are good, but there are limitations (eg >>>>>>>>> dynamic scaling) without us making changes inside of the Spark Kube >>>>>>>>> scheduler. >>>>>>>>> >>>>>>>>> Certainly whichever scheduler extensions we add support for we >>>>>>>>> should collaborate with the people developing those extensions >>>>>>>>> insofar as >>>>>>>>> they are interested. My first place that I checked was #sig-scheduling >>>>>>>>> which is fairly quite on the Kubernetes slack but if there are more >>>>>>>>> places >>>>>>>>> to look for folks interested in batch scheduling on Kubernetes we >>>>>>>>> should >>>>>>>>> definitely give it a shot :) >>>>>>>>> >>>>>>>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh < >>>>>>>>> mich.talebza...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Regarding your point and I quote >>>>>>>>>> >>>>>>>>>> ".. I know that one of the Spark on Kube operators >>>>>>>>>> supports volcano/kube-batch so I was thinking that might be a place >>>>>>>>>> I would >>>>>>>>>> start exploring..." >>>>>>>>>> >>>>>>>>>> There seems to be ongoing work on say Volcano as part of Cloud >>>>>>>>>> Native Computing Foundation <https://cncf.io/> (CNCF). For >>>>>>>>>> example through https://github.com/volcano-sh/volcano >>>>>>>>>> >>>>>>>>> <https://github.com/volcano-sh/volcano> >>>>>>>>>> >>>>>>>>>> There may be value-add in collaborating with such groups through >>>>>>>>>> CNCF in order to have a collective approach to such work. There also >>>>>>>>>> seems >>>>>>>>>> to be some work on Integration of Spark with Volcano for Batch >>>>>>>>>> Scheduling. >>>>>>>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> What is not very clear is the degree of progress of these >>>>>>>>>> projects. You may be kind enough to elaborate on KPI for each of >>>>>>>>>> these >>>>>>>>>> projects and where you think your contributions is going to be. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> HTH, >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Mich >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> view my Linkedin profile >>>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all >>>>>>>>>> responsibility for any loss, damage or destruction of data or any >>>>>>>>>> other >>>>>>>>>> property which may arise from relying on this email's technical >>>>>>>>>> content is >>>>>>>>>> explicitly disclaimed. The author will in no case be liable for any >>>>>>>>>> monetary damages arising from such loss, damage or destruction. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Folks, >>>>>>>>>>> >>>>>>>>>>> I'm continuing my adventures to make Spark on containers party >>>>>>>>>>> and I >>>>>>>>>>> was wondering if folks have experience with the different batch >>>>>>>>>>> scheduler options that they prefer? I was thinking so that we can >>>>>>>>>>> better support dynamic allocation it might make sense for us to >>>>>>>>>>> support using different schedulers and I wanted to see if there >>>>>>>>>>> are >>>>>>>>>>> any that the community is more interested in? >>>>>>>>>>> >>>>>>>>>>> I know that one of the Spark on Kube operators supports >>>>>>>>>>> volcano/kube-batch so I was thinking that might be a place I >>>>>>>>>>> start >>>>>>>>>>> exploring but also want to be open to other schedulers that folks >>>>>>>>>>> might be interested in. >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> >>>>>>>>>>> Holden :) >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>>>> https://amzn.to/2MaRAG9 >>>>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> --------------------------------------------------------------------- >>>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Twitter: https://twitter.com/holdenkarau >>>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>>> >>>>>>