Fei Hu created SYSTEMML-1809:
--------------------------------

             Summary: Optimize the performance of the distributed 
MNIST_LeNet_Sgd model training
                 Key: SYSTEMML-1809
                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1809
             Project: SystemML
          Issue Type: Task
    Affects Versions: SystemML 1.0
            Reporter: Fei Hu


For the current version, there are two bottleneck for the distributed 
MNIST_LeNet_Sdg model training: 1) data locality: for {{RemoteParForSpark}}, 
the tasks are parallelized without considering data locality. It will cause a 
lot of data shuffling if the volume of the input data size is large; 2) Result 
merge: the current experiments indicate that the result merge part took more 
time than model training. After the optimization, we need to compare the 
performance with the distributed Tensorflow.  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to