[
https://issues.apache.org/jira/browse/SYSTEMML-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805498#comment-15805498
]
Mike Dusenberry commented on SYSTEMML-686:
------------------------------------------
That's a good set of options. Here are some thoughts:
# This is certainly valid for prediction, and is the current approach used in
our [{{mnist_lenet.dml (line 246)}} |
https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/mnist_lenet.dml#L246]
script. Both training & prediction with this approach would benefit from
SYSTEMML-1160. Easiest solution, but I think the other approaches could be
faster for prediction (although SYSTEMML-1160 would certainly be extremely
helpful for training).
# This of course fits in nicely with the rest of the SystemML project, and
would be particularly useful for large-scale prediction. Seems like this would
take the most work, and it's not clear that we would still be able to cleanly
target cuDNN, but large-scale, batch prediction ("scoring") in non-GPU clusters
could play to the current strengths of SystemML, given the history of the
project. Things like sparse support, caching, etc. could prove to be quite
beneficial in this scenario, perhaps with some tweaking to the current
assumptions (SYSTEMML-1140). I'd be quite curious to see if we could beat an
approach like (3) with this for large-scale, batch predictions.
# This was the approach I initially wanted to use in the {{mnist_lenet.dml}}
script above as a quick means to utilizing the full cluster for predictions,
which can be done in distributed, full-batch mode. Unfortunately, Spark
{{parfor}} support was broken at the time, as detailed in SYSTEMML-1129, but
[~fschueler] has since worked on it. There are still concerns about
performance due to the way parfor currently distributes data. In general
though, the idea of this approach makes sense particularly in the case of GPU
clusters in which we could target cuDNN on each node. It's simple, and could
be quicker than (2) to get running. SYSTEMML-1159 would also benefit from a
performant {{parfor}} implementation.
My thought is that we should first attempt to speed up (1) especially with
regards to feeding data in as quick as possible, as it will be needed for
efficient training as well. Then I think (3) would be good to get running
efficiently, as it has benefits for both batch prediction ("scoring") and
hyperparameter tuning.
I'm quite interested in thoughts from others as well.
> Implement Spark instructions for convolution and pooling functions
> ------------------------------------------------------------------
>
> Key: SYSTEMML-686
> URL: https://issues.apache.org/jira/browse/SYSTEMML-686
> Project: SystemML
> Issue Type: Task
> Reporter: Niketan Pansare
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)