Thanks Yikun! On Thu, Jun 24, 2021 at 8:54 PM Yikun Jiang <yikunk...@gmail.com> wrote:
> Hi, folks. > > As @Klaus mentioned, We have some work on Spark on k8s with volcano native > support. Also, there were also some production deployment validation from > our partners in China, like JingDong, XiaoHongShu, VIPshop. > > We will also prepare to propose an initial design and POC[3] on a shared > branch (based on spark master branch) where we can collaborate on it, so I > created the spark-volcano[1] org in github to make it happen. > > Pls feel free to comment on it [2] if you guys have any questions or > concerns. > > [1] https://github.com/spark-volcano > [2] https://github.com/spark-volcano/spark/issues/1 > [3] https://github.com/spark-volcano-wip/spark-3-volcano > > Regards, > Yikun > > Holden Karau <hol...@pigscanfly.ca> 于2021年6月25日周五 上午12:00写道: > >> Hi Mich, >> >> I certainly think making Spark on Kubernetes run well is going to be a >> challenge. However I think, and I could be wrong about this as well, that >> in terms of cluster managers Kubernetes is likely to be our future. Talking >> with people I don't hear about new standalone, YARN or mesos deployments of >> Spark, but I do hear about people trying to migrate to Kubernetes. >> >> To be clear I certainly agree that we need more work on structured >> streaming, but its important to remember that the Spark developers are not >> all fully interchangeable, we work on the things that we're interested in >> pursuing so even if structured streaming needs more love if I'm not super >> interested in structured streaming I'm less likely to work on it. That >> being said I am certainly spinning up a bit more in the Spark SQL area >> especially around our data source/connectors because I can see the need >> there too. >> >> On Wed, Jun 23, 2021 at 8:26 AM Mich Talebzadeh < >> mich.talebza...@gmail.com> wrote: >> >>> >>> >>> Please allow me to be diverse and express a different point of view on >>> this roadmap. >>> >>> >>> I believe from a technical point of view spending time and effort plus >>> talent on batch scheduling on Kubernetes could be rewarding. However, if I >>> may say I doubt whether such an approach and the so-called democratization >>> of Spark on whatever platform is really should be of great focus. >>> >>> Having worked on Google Dataproc <https://cloud.google.com/dataproc> (A >>> fully >>> managed and highly scalable service for running Apache Spark, Hadoop and >>> more recently other artefacts) for that past two years, and Spark on >>> Kubernetes on-premise, I have come to the conclusion that Spark is not a >>> beast that that one can fully commoditize it much like one can do with >>> Zookeeper, Kafka etc. There is always a struggle to make some niche areas >>> of Spark like Spark Structured Streaming (SSS) work seamlessly and >>> effortlessly on these commercial platforms with whatever as a Service. >>> >>> >>> Moreover, Spark (and I stand corrected) from the ground up has already a >>> lot of resiliency and redundancy built in. It is truly an enterprise class >>> product (requires enterprise class support) that will be difficult to >>> commoditize with Kubernetes and expect the same performance. After all, >>> Kubernetes is aimed at efficient resource sharing and potential cost saving >>> for the mass market. In short I can see commercial enterprises will work on >>> these platforms ,but may be the great talents on dev team should focus on >>> stuff like the perceived limitation of SSS in dealing with chain of >>> aggregation( if I am correct it is not yet supported on streaming datasets) >>> >>> >>> These are my opinions and they are not facts, just opinions so to speak >>> :) >>> >>> >>> view my Linkedin profile >>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>> >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> On Fri, 18 Jun 2021 at 23:18, Holden Karau <hol...@pigscanfly.ca> wrote: >>> >>>> I think these approaches are good, but there are limitations (eg >>>> dynamic scaling) without us making changes inside of the Spark Kube >>>> scheduler. >>>> >>>> Certainly whichever scheduler extensions we add support for we should >>>> collaborate with the people developing those extensions insofar as they are >>>> interested. My first place that I checked was #sig-scheduling which is >>>> fairly quite on the Kubernetes slack but if there are more places to look >>>> for folks interested in batch scheduling on Kubernetes we should definitely >>>> give it a shot :) >>>> >>>> On Fri, Jun 18, 2021 at 1:41 AM Mich Talebzadeh < >>>> mich.talebza...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> Regarding your point and I quote >>>>> >>>>> ".. I know that one of the Spark on Kube operators >>>>> supports volcano/kube-batch so I was thinking that might be a place I >>>>> would >>>>> start exploring..." >>>>> >>>>> There seems to be ongoing work on say Volcano as part of Cloud >>>>> Native Computing Foundation <https://cncf.io/> (CNCF). For example >>>>> through https://github.com/volcano-sh/volcano >>>>> >>>> <https://github.com/volcano-sh/volcano> >>>>> >>>>> There may be value-add in collaborating with such groups through CNCF >>>>> in order to have a collective approach to such work. There also seems to >>>>> be >>>>> some work on Integration of Spark with Volcano for Batch Scheduling. >>>>> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/volcano-integration.md> >>>>> >>>>> >>>>> >>>>> What is not very clear is the degree of progress of these projects. >>>>> You may be kind enough to elaborate on KPI for each of these projects and >>>>> where you think your contributions is going to be. >>>>> >>>>> >>>>> HTH, >>>>> >>>>> >>>>> Mich >>>>> >>>>> >>>>> view my Linkedin profile >>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >>>>> >>>>> >>>>> >>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>>>> any loss, damage or destruction of data or any other property which may >>>>> arise from relying on this email's technical content is explicitly >>>>> disclaimed. The author will in no case be liable for any monetary damages >>>>> arising from such loss, damage or destruction. >>>>> >>>>> >>>>> >>>>> >>>>> On Fri, 18 Jun 2021 at 00:44, Holden Karau <hol...@pigscanfly.ca> >>>>> wrote: >>>>> >>>>>> Hi Folks, >>>>>> >>>>>> I'm continuing my adventures to make Spark on containers party and I >>>>>> was wondering if folks have experience with the different batch >>>>>> scheduler options that they prefer? I was thinking so that we can >>>>>> better support dynamic allocation it might make sense for us to >>>>>> support using different schedulers and I wanted to see if there are >>>>>> any that the community is more interested in? >>>>>> >>>>>> I know that one of the Spark on Kube operators supports >>>>>> volcano/kube-batch so I was thinking that might be a place I start >>>>>> exploring but also want to be open to other schedulers that folks >>>>>> might be interested in. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Holden :) >>>>>> >>>>>> -- >>>>>> Twitter: https://twitter.com/holdenkarau >>>>>> Books (Learning Spark, High Performance Spark, etc.): >>>>>> https://amzn.to/2MaRAG9 >>>>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>> >>>>>> -- >>>> Twitter: https://twitter.com/holdenkarau >>>> Books (Learning Spark, High Performance Spark, etc.): >>>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>>> >>> >> >> -- >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >> > -- John Zhuge