[ 
https://issues.apache.org/jira/browse/SYSTEMML-686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15805498#comment-15805498
 ] 

Mike Dusenberry commented on SYSTEMML-686:
------------------------------------------

That's a good set of options.  Here are some thoughts:

# This is certainly valid for prediction, and is the current approach used in 
our [{{mnist_lenet.dml (line 246)}} | 
https://github.com/apache/incubator-systemml/blob/master/scripts/staging/SystemML-NN/examples/mnist_lenet.dml#L246]
 script.  Both training & prediction with this approach would benefit from 
SYSTEMML-1160.  Easiest solution, but I think the other approaches could be 
faster for prediction (although SYSTEMML-1160 would certainly be extremely 
helpful for training).
# This of course fits in nicely with the rest of the SystemML project, and 
would be particularly useful for large-scale prediction.  Seems like this would 
take the most work, and it's not clear that we would still be able to cleanly 
target cuDNN, but large-scale, batch prediction ("scoring") in non-GPU clusters 
could play to the current strengths of SystemML, given the history of the 
project.  Things like sparse support, caching, etc. could prove to be quite 
beneficial in this scenario, perhaps with some tweaking to the current 
assumptions (SYSTEMML-1140).  I'd be quite curious to see if we could beat an 
approach like (3) with this for large-scale, batch predictions.
# This was the approach I initially wanted to use in the {{mnist_lenet.dml}} 
script above as a quick means to utilizing the full cluster for predictions, 
which can be done in distributed, full-batch mode.  Unfortunately, Spark 
{{parfor}} support was broken at the time, as detailed in SYSTEMML-1129, but 
[~fschueler] has since worked on it.  There are still concerns about 
performance due to the way parfor currently distributes data.  In general 
though, the idea of this approach makes sense particularly in the case of GPU 
clusters in which we could target cuDNN on each node.  It's simple, and could 
be quicker than (2) to get running.  SYSTEMML-1159 would also benefit from a 
performant {{parfor}} implementation.

My thought is that we should first attempt to speed up (1) especially with 
regards to feeding data in as quick as possible, as it will be needed for 
efficient training as well.  Then I think (3) would be good to get running 
efficiently, as it has benefits for both batch prediction ("scoring") and 
hyperparameter tuning.

I'm quite interested in thoughts from others as well.

> Implement Spark instructions for convolution and pooling functions
> ------------------------------------------------------------------
>
>                 Key: SYSTEMML-686
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-686
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Niketan Pansare
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to