[ 
https://issues.apache.org/jira/browse/SINGA-48?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694798#comment-14694798
 ] 

ASF subversion and git services commented on SINGA-48:
------------------------------------------------------

Commit db440127bc35c626a4e8407c6e3bfd9331870a37 in incubator-singa's branch 
refs/heads/master from Wei Wang
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=db44012 ]

SINGA-48 Fix a bug in trainer.cc that assigns the same NeuralNet instance to 
workers from diff groups

Cleaned the code for SetupWorkerServer and fixed the bug.


> Fix a bug in trainer.cc that assigns the same NeuralNet instance to workers 
> from diff groups
> --------------------------------------------------------------------------------------------
>
>                 Key: SINGA-48
>                 URL: https://issues.apache.org/jira/browse/SINGA-48
>             Project: Singa
>          Issue Type: Bug
>            Reporter: wangwei
>
> In SINGA, workers from the same group and in the same process share the same 
> NeuralNet instance. Different worker groups should have different NeuralNet 
> objects However, the current Trainer::SetupWorkerServer function assigns the 
> same NeuralNet instance to workers in different groups. Consequently, two 
> workers may compute for the same layer instance which would lead to repeated 
> calling of ComputeFeature and ComputeGradient functions, and case run-time 
> errors.
> Another issue is that if two workers from different groups but resident in 
> the same process, they would share memory for layer parameters. The memory 
> sharing has no problem if the group size is 1. But if there are more than 1 
> workers in a group, they should run synchronously. The synchronization is 
> controlled by parameter version. If memory sharing is enabled, workers from 
> other groups may increase the parameter version that leads to errors in 
> synchronization. To fix this issue, SINGA needs to disable memory sharing 
> among groups if worker group size >1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to