On 2/6/20 2:53 AM, Jiaxin Shan wrote: > I will vote for this. It's pretty helpful to have managed Spark > images. Currently, user have to download Spark binaries and build > their own. > With this supported, user journey will be simplified and we only need > to build an application image on top of base image provided by community. > > Do we have different OS or architecture support? If not, there will be > Java, R, Python total 3 container images for every release.
Well, technically speaking there are 3 non-deprecated Python versions (4 if you count PyPy), 3 non-deprecated R versions, luckily only one non-deprecated Scala version and possible variations of JDK. Latest and greatest are not necessarily the most popular and useful. That's on top of native dependencies like BLAS (possibly in different flavors and accounting for netlib-java break in development), libparquet and libarrow. Not all of these must be generated, but complexity grows pretty fast, especially when native dependencies are involved. It gets worse if you actually want to support Spark builds and tests ‒ for example to build and fully test SparkR builds you need half of the universe including some awkward LaTex style patches and such (https://github.com/zero323/sparkr-build-sandbox). End even without that images tend to grow pretty large. Few years back me and Elias <https://github.com/eliasah> experimented with the idea of generating different sets of Dockerfiles ‒ https://github.com/spark-in-a-box/spark-in-a-box ‒ intended use cases where rather different (mostly quick setup of testbeds) though. The project has been inactive for a while, with some private patches to fit this or that use case. > > On Wed, Feb 5, 2020 at 2:56 PM Sean Owen <sro...@gmail.com > <mailto:sro...@gmail.com>> wrote: > > What would the images have - just the image for a worker? > We wouldn't want to publish N permutations of Python, R, OS, Java, > etc. > But if we don't then we make one or a few choices of that combo, and > then I wonder how many people find the image useful. > If the goal is just to support Spark testing, that seems fine and > tractable, but does it need to be 'public' as in advertised as a > convenience binary? vs just some image that's hosted somewhere for the > benefit of project infra. > > On Wed, Feb 5, 2020 at 12:16 PM Dongjoon Hyun > <dongjoon.h...@gmail.com <mailto:dongjoon.h...@gmail.com>> wrote: > > > > Hi, All. > > > > From 2020, shall we have an official Docker image repository as > an additional distribution channel? > > > > I'm considering the following images. > > > > - Public binary release (no snapshot image) > > - Public non-Spark base image (OS + R + Python) > > (This can be used in GitHub Action Jobs and Jenkins K8s > Integration Tests to speed up jobs and to have more stabler > environments) > > > > Bests, > > Dongjoon. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > <mailto:dev-unsubscr...@spark.apache.org> > > > > -- > Best Regards! > Jiaxin Shan > Tel: 412-230-7670 > Address: 470 2nd Ave S, Kirkland, WA > -- Best regards, Maciej Szymkiewicz Web: https://zero323.net Keybase: https://keybase.io/zero323 Gigs: https://www.codementor.io/@zero323 PGP: C095AA7F33E6123A
signature.asc
Description: OpenPGP digital signature