Artem Malykh created IGNITE-6783:
------------------------------------
Summary: Create common mechanism for group training.
Key: IGNITE-6783
URL: https://issues.apache.org/jira/browse/IGNITE-6783
Project: Ignite
Issue Type: Task
Security Level: Public (Viewable by anyone)
Reporter: Artem Malykh
Assignee: Artem Malykh
In distributed ML it is a common task to train several models in parallel with
ability to communicate with each other during training. Simple example of this
case is training of neural network with SGD on different chunks of data located
on several nodes. In such training we do the following in a loop: on each node
we do one or several SGD steps then send gradient on central node which
averages gradients from each of worker nodes and send back the averaged
gradient. There is a pattern in this procedure which can be applied to other ML
algos and it could be useful to extract this pattern.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)