Artem Malykh created IGNITE-6783:
------------------------------------

             Summary: Create common mechanism for group training.
                 Key: IGNITE-6783
                 URL: https://issues.apache.org/jira/browse/IGNITE-6783
             Project: Ignite
          Issue Type: Task
      Security Level: Public (Viewable by anyone)
            Reporter: Artem Malykh
            Assignee: Artem Malykh


In distributed ML it is a common task to train several models in parallel with 
ability to communicate with each other during training. Simple example of this 
case is training of neural network with SGD on different chunks of data located 
on several nodes. In such training we do the following in a loop: on each node 
we do one or several SGD steps then send gradient on central node which 
averages gradients from each of worker nodes and send back the averaged 
gradient. There is a pattern in this procedure which can be applied to other ML 
algos and it could be useful to extract this pattern.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to