Fei Hu created SYSTEMML-1809:
--------------------------------
Summary: Optimize the performance of the distributed
MNIST_LeNet_Sgd model training
Key: SYSTEMML-1809
URL: https://issues.apache.org/jira/browse/SYSTEMML-1809
Project: SystemML
Issue Type: Task
Affects Versions: SystemML 1.0
Reporter: Fei Hu
For the current version, there are two bottleneck for the distributed
MNIST_LeNet_Sdg model training: 1) data locality: for {{RemoteParForSpark}},
the tasks are parallelized without considering data locality. It will cause a
lot of data shuffling if the volume of the input data size is large; 2) Result
merge: the current experiments indicate that the result merge part took more
time than model training. After the optimization, we need to compare the
performance with the distributed Tensorflow.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)