I will reiterate some feedback I left on the PR. Firstly, it’s not immediately clear if we should be opinionated around supporting GPUs in the Docker image in a first class way.
Firstly there’s the question of how we arbitrate the kinds of customizations we support moving forward. For example if we say we support GPUs now, what’s to say that we should not also support FPGAs? Also what kind of testing can we add to CI to ensure what we’ve provided in this Dockerfile works? Instead we can make the Spark images have bare minimum support for basic Spark applications, and then provide detailed instructions for how to build custom Docker images (mostly just needing to make sure the custom image has the right entry point). -Matt Cheah From: Rong Ou <[email protected]> Date: Friday, February 8, 2019 at 2:28 PM To: "[email protected]" <[email protected]> Subject: building docker images for GPU Hi spark dev, I created a JIRA issue a while ago (https://issues.apache.org/jira/browse/SPARK-26398 [issues.apache.org]) to add GPU support to Spark docker images, and sent a PR (https://github.com/apache/spark/pull/23347 [github.com]) that went through several iterations. It was suggested that it should be discussed on the dev mailing list, so here we are. Please chime in if you have any questions or concerns. A little more background. I mainly looked at running XGBoost on Spark using GPUs. Preliminary results have shown that there is potential for significant speedup in training time. This seems like a popular use case for Spark. In any event, it'd be nice for Spark to have better support for GPUs. Building gpu-enabled docker images seems like a useful first step. Thanks, Rong
smime.p7s
Description: S/MIME cryptographic signature
