[ 
https://issues.apache.org/jira/browse/SINGA-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15217586#comment-15217586
 ] 

ASF subversion and git services commented on SINGA-148:
-------------------------------------------------------

Commit 9679417509baf62e0c565e0cec140844b778d827 in incubator-singa's branch 
refs/heads/master from [~flytosky]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-singa.git;h=9679417 ]

SINGA-148 Race condition between Worker threads and Driver

The worker may query the device id before the driver sets it up (via
Context).  It is fixed by sleeping the worker until the driver finishes the
setting.  Now all devices (GPU and CPU) must be setup via Context::SetupDevice,
otherwise the worker would sleep forever.
The Blob math functions now check device_id < 0 to call CPU/GPU
functions.


> Race condition between Worker threads and Driver
> ------------------------------------------------
>
>                 Key: SINGA-148
>                 URL: https://issues.apache.org/jira/browse/SINGA-148
>             Project: Singa
>          Issue Type: Bug
>         Environment: Tested on Debian 8.3
>            Reporter: Tan Li Boon
>              Labels: newbie
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> In the main branch, a thread of execution for each worker is started before 
> information about its device type is passed into it.
> This causes test cases such as MNIST to not use GPUs for calculation, and 
> other test cases such as CIFAR10 to fail when it attempts to perform 
> cudnn-based functions on the CPU.
> A temporary workaround is to make the worker thread sleep for 1 second, but 
> that's not a desirable workaround.
> This issue aims to implement a proper fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to