Rong Ou created SPARK-26398:
-------------------------------

             Summary: Support building GPU docker images
                 Key: SPARK-26398
                 URL: https://issues.apache.org/jira/browse/SPARK-26398
             Project: Spark
          Issue Type: Improvement
          Components: Kubernetes
    Affects Versions: 2.4.0
            Reporter: Rong Ou


To run Spark on Kubernetes, a user first needs to build docker images using the 
`bin/docker-image-tool.sh` script. However, this script only supports building 
images for running on CPUs. As parts of Spark and related libraries (e.g. 
XGBoost) get accelerated on GPUs, it's desirable to build base images that can 
take advantage of GPU acceleration.

This issue only addresses building docker images with CUDA support. Actually 
accelerating Spark on GPUs is outside the scope, as is supporting other types 
of GPUs.

Today if anyone wants to experiment with running Spark on Kubernetes with GPU 
support, they have to write their own custom `Dockerfile`. By providing an 
"official" way to build GPU-enabled docker images, we can make it easier to get 
started.

For now probably not that many people care about this, but it's a necessary 
first step towards GPU acceleration for Spark on Kubernetes.

The risks are minimal as we only need to make minor changes to 
`bin/docker-image-tool.sh`. The PR is already done and will be attached. 
Success means anyone can easily build Spark docker images with GPU support.

Proposed API changes: add an optional  `-g` flag to `bin/docker-image-tool.sh` 
for building GPU versions of the JVM/Python/R docker images. When the `-g` is 
omitted, existing behavior is preserved.

Design sketch: when the `-g` flag is specified, we append `-gpu` to the docker 
image names, and switch to dockerfiles based on the official CUDA images. Since 
the CUDA images are based on Ubuntu while the Spark dockerfiles are based on 
Alpine, steps for setting up additional packages are different, so there are a 
parallel set of `Dockerfile.gpu` files.

Alternative: if we are willing to forego Alpine and switch to Ubuntu for the 
CPU-only images, the two sets of dockerfiles can be unified, and we can just 
pass in a different base image depending on whether the `-g` flag is present or 
not.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to