Hello Andy,
regarding your question, this will depend a lot on the specific task:
 - for tasks that are "easy" to distribute such as inference
(scoring), hyper-parameter tuning or cross-validation, these tasks
will take full advantage of the cluster and the performance should
improve more or less linearly
 - for training the same model with multiple machines, and a
distributed dataset, then you are currently better off with a
dedicated solution such as TensorFlowOnSpark or dist-keras. We are
working on addressing this issue in a future release.

Also, we opened a mailing list dedicated to Deep Learning Pipelines,
to which I will copy this answer. Feel free to answer there:

https://groups.google.com/forum/#!forum/dl-pipelines-users/


Tim


On November 22, 2017 at 10:02:59 AM, Andy Davidson
(a...@santacruzintegration.com) wrote:
> I am starting a new deep learning project currently we do all of our work on
> a single machine using a combination of Keras and Tensor flow.
> https://databricks.github.io/spark-deep-learning/site/index.html looks very
> promising. Any idea how performance is likely to improve as I add machines
> to my my cluster?
>
> Kind regards
>
> Andy
>
>
> P.s. Is user@spark.apache.org the best place to ask questions about this
> package?
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to