'Leave apps in memory' hardly applies to GPU tasks - they are only preserved in 
memory during CPU benchmarks. I certainly run both SETI and GPUGrid in modes 
where they are liable to suspension mid-run, and with BOINC set for 'leave apps 
in memory' - no problems seen: GPUGrid on my GTX 470 is suspended pretty much 
every task, two or three times.
 
Do you have proper "critical section" protection around kernel launches and 
thread synchronisation? Otherwise, you might be hitting non threadsafe 
behaviour problems - it's even possible that your application is failing to 
receive BOINC's instruction to suspend itself. Mind you, I don't know whether 
BOINC signals an instruction for an app to shut itself down via a message 
(which might get overlooked), or a semaphore which stays set until the next 
state change.
 
I've had a few errors like [22:14:54][3524][ERROR] Error during CUDA 
device->host HS data transfer (error: 999) with your app recently - list at 
http://einstein.phys.uwm.edu/results.php?hostid=5744895&offset=0&show_names=1&state=5&appid=0.
 I asked a friend to cast an eye over your code, because there was a suspicion 
in our minds that trying to run your cuda32 app and his cuda42 app on the same 
(GTX 670 Kepler) GPU at the same time might be the cause of some of my 
problems. His comment was
"Without thoroughly having looked for the usual thread safety issues, I did 
notice that there is a distinct absence of any explicit synchronisation.  That 
implies the same thread safety issues will likely be there, on top of some 
driver & Cuda runtime issues that really require at least some 
explicit synchronisation present to avoid.  Probably a patched build could be 
tested readily enough, as the use of Cuda in that application appears 
relatively simple.  Being under driver api, as opposed to Cuda Runtime, is a 
partial advantage, though use of CUFFT and its dependency on the cuda runtime 
eliminates most of those advantages leaving the prominent disadvantage as their 
source being harder to read (& so debug & maintain). "
> 
I'm not a programmer, just a messenger - but it sounds as if there might be 
some scope for cross-fertilisation there.
 



>________________________________
>From: Oliver Bock <[email protected]>
>To: David Anderson <[email protected]>; Rom Walton 
><[email protected]> 
>Cc: boinc_dev <[email protected]> 
>Sent: Monday, 8 October 2012, 16:00
>Subject: [boinc_dev] GPU tasks still running while suspended
>
>Hi,
>
>We're getting reports that our CUDA app keeps running while BOINC shows
>the task as suspended (because the computer is in use). Such a task's
>elapsed time keeps increasing (in BOINC) and the process is indeed running.
>
>So far we know of two affected hosts, one Windows, one Linux (here:
>7.0.31, compiled from source).
>
>Any idea what might cause this? Could the "leaving suspended
>applications in memory" setting be related to this?
>
>
>Cheers,
>Oliver
>_______________________________________________
>boinc_dev mailing list
>[email protected]
>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>To unsubscribe, visit the above URL and
>(near bottom of page) enter your email address.
>
>
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to