Fei Hu created SYSTEMML-1774: -------------------------------- Summary: Improve Parfor parallelism for deep learning Key: SYSTEMML-1774 URL: https://issues.apache.org/jira/browse/SYSTEMML-1774 Project: SystemML Issue Type: Improvement Components: Algorithms Affects Versions: SystemML 1.0 Reporter: Fei Hu
When running the [distributed MNIST LeNet example | https://github.com/apache/systemml/blob/master/scripts/nn/examples/mnist_lenet_distrib_sgd.dml], each mini-batch could ideally run in parallel without interaction. We try to force {{parfor (j in 1:parallel_batches)}} at line 137 of {{nn/examples/mnist_lenet_distrib_sgd.dml}} to use {{REMOTE_SPARK}} mode, but got some errors about {{org.apache.sysml.runtime.DMLRuntimeException: Not supported: Instructions of type other than CP instructions}}. More log information can be found at the following comments. One example of the errors is that at the convolution layer, we need to randomly generate a matrix, but SystemML choose {{RandSPInstruction}} instead of {{DataGenCPInstruction}}, which may be because SystemML could not determine the row number of the matrix. For this distributed MNIST LeNet example, using CPInstruction may achieve better performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)