Hello Andy, regarding your question, this will depend a lot on the specific task: - for tasks that are "easy" to distribute such as inference (scoring), hyper-parameter tuning or cross-validation, these tasks will take full advantage of the cluster and the performance should improve more or less linearly - for training the same model with multiple machines, and a distributed dataset, then you are currently better off with a dedicated solution such as TensorFlowOnSpark or dist-keras. We are working on addressing this issue in a future release.
Also, we opened a mailing list dedicated to Deep Learning Pipelines, to which I will copy this answer. Feel free to answer there: https://groups.google.com/forum/#!forum/dl-pipelines-users/ Tim On November 22, 2017 at 10:02:59 AM, Andy Davidson (a...@santacruzintegration.com) wrote: > I am starting a new deep learning project currently we do all of our work on > a single machine using a combination of Keras and Tensor flow. > https://databricks.github.io/spark-deep-learning/site/index.html looks very > promising. Any idea how performance is likely to improve as I add machines > to my my cluster? > > Kind regards > > Andy > > > P.s. Is user@spark.apache.org the best place to ask questions about this > package? > > > > > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org