Ondrej Kokes created SPARK-26773:
------------------------------------

             Summary: Consider alternative base images for Kubernetes
                 Key: SPARK-26773
                 URL: https://issues.apache.org/jira/browse/SPARK-26773
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes, PySpark
    Affects Versions: 2.4.0
            Reporter: Ondrej Kokes


I understand the desire to make the base image (not just) for Kubernetes to be 
minimal and thus the choice of Alpine, but that distro has its limitations. The 
main one being musl as its libc implementation.

The main reason for us not to use Alpine for our non-Spark workloads is that 
we're using Python and *we cannot use pre-built distributions of packages 
(so-called wheels)*, because they are usually built for glibc-based distros 
(work is being done for musl-based builds, but we're not there yet [0]).

So instead of popular packages like numpy or pandas being installed in seconds, 
a build process has to be initiated upon each installation of many packages 
(and that requires a compiler etc.). We could theoretically build all these 
packages into the base image, but that would require multi-step builds, so that 
we don't include gcc/clang in the final image, having to rebuild the docker 
image with each dependency change etc.

There have already been similar issues submitted [1].

*I'm not sure what the best course of action is.* If there should be a e.g. 
debian-based distro as an alternative. Or perhaps there could be a good reason 
for a glibc-based distro to be the default Docker base image, with an option to 
"downgrade" to Alpine. (I'm guessing that R, with its popular Rcpp-based 
extensions, might suffer from a similar problem, but I'm mostly guessing. [2])

 

[0] https://www.python.org/dev/peps/pep-0513/
[1] https://github.com/apache-spark-on-k8s/spark/issues/326
[2] https://github.com/rocker-org/rocker/issues/231#issuecomment-297150217



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to