On 2/6/20 2:53 AM, Jiaxin Shan wrote:
> I will vote for this. It's pretty helpful to have managed Spark
> images. Currently, user have to download Spark binaries and build
> their own. 
> With this supported, user journey will be simplified and we only need
> to build an application image on top of base image provided by community. 
>
> Do we have different OS or architecture support? If not, there will be
> Java, R, Python total 3 container images for every release.

Well, technically speaking there are 3 non-deprecated Python versions (4
if you count PyPy), 3 non-deprecated R versions, luckily only one
non-deprecated Scala version and possible variations of JDK. Latest and
greatest are not necessarily the most popular and useful.

That's on top of native dependencies like BLAS (possibly in different
flavors and accounting for netlib-java break in development), libparquet
and libarrow.

Not all of these must be generated, but complexity grows pretty fast,
especially when native dependencies are involved. It gets worse if you
actually want to support Spark builds and tests ‒ for example to build
and fully test SparkR builds you need half of the universe including
some awkward LaTex style patches and such
(https://github.com/zero323/sparkr-build-sandbox).

End even without that images tend to grow pretty large.

Few years back me and Elias <https://github.com/eliasah> experimented
with the idea of generating different sets of Dockerfiles ‒
https://github.com/spark-in-a-box/spark-in-a-box ‒ intended use cases
where rather different (mostly quick setup of testbeds) though. The
project has been inactive for a while, with some private patches to fit
this or that use case.

>
> On Wed, Feb 5, 2020 at 2:56 PM Sean Owen <sro...@gmail.com
> <mailto:sro...@gmail.com>> wrote:
>
>     What would the images have - just the image for a worker?
>     We wouldn't want to publish N permutations of Python, R, OS, Java,
>     etc.
>     But if we don't then we make one or a few choices of that combo, and
>     then I wonder how many people find the image useful.
>     If the goal is just to support Spark testing, that seems fine and
>     tractable, but does it need to be 'public' as in advertised as a
>     convenience binary? vs just some image that's hosted somewhere for the
>     benefit of project infra.
>
>     On Wed, Feb 5, 2020 at 12:16 PM Dongjoon Hyun
>     <dongjoon.h...@gmail.com <mailto:dongjoon.h...@gmail.com>> wrote:
>     >
>     > Hi, All.
>     >
>     > From 2020, shall we have an official Docker image repository as
>     an additional distribution channel?
>     >
>     > I'm considering the following images.
>     >
>     >     - Public binary release (no snapshot image)
>     >     - Public non-Spark base image (OS + R + Python)
>     >       (This can be used in GitHub Action Jobs and Jenkins K8s
>     Integration Tests to speed up jobs and to have more stabler
>     environments)
>     >
>     > Bests,
>     > Dongjoon.
>
>     ---------------------------------------------------------------------
>     To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>     <mailto:dev-unsubscr...@spark.apache.org>
>
>
>
> -- 
> Best Regards!
> Jiaxin Shan
> Tel:  412-230-7670
> Address: 470 2nd Ave S, Kirkland, WA
>
-- 
Best regards,
Maciej Szymkiewicz

Web: https://zero323.net
Keybase: https://keybase.io/zero323
Gigs: https://www.codementor.io/@zero323
PGP: C095AA7F33E6123A

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to