'Leave apps in memory' hardly applies to GPU tasks - they are only preserved in memory during CPU benchmarks. I certainly run both SETI and GPUGrid in modes where they are liable to suspension mid-run, and with BOINC set for 'leave apps in memory' - no problems seen: GPUGrid on my GTX 470 is suspended pretty much every task, two or three times. Do you have proper "critical section" protection around kernel launches and thread synchronisation? Otherwise, you might be hitting non threadsafe behaviour problems - it's even possible that your application is failing to receive BOINC's instruction to suspend itself. Mind you, I don't know whether BOINC signals an instruction for an app to shut itself down via a message (which might get overlooked), or a semaphore which stays set until the next state change. I've had a few errors like [22:14:54][3524][ERROR] Error during CUDA device->host HS data transfer (error: 999) with your app recently - list at http://einstein.phys.uwm.edu/results.php?hostid=5744895&offset=0&show_names=1&state=5&appid=0. I asked a friend to cast an eye over your code, because there was a suspicion in our minds that trying to run your cuda32 app and his cuda42 app on the same (GTX 670 Kepler) GPU at the same time might be the cause of some of my problems. His comment was "Without thoroughly having looked for the usual thread safety issues, I did notice that there is a distinct absence of any explicit synchronisation. That implies the same thread safety issues will likely be there, on top of some driver & Cuda runtime issues that really require at least some explicit synchronisation present to avoid. Probably a patched build could be tested readily enough, as the use of Cuda in that application appears relatively simple. Being under driver api, as opposed to Cuda Runtime, is a partial advantage, though use of CUFFT and its dependency on the cuda runtime eliminates most of those advantages leaving the prominent disadvantage as their source being harder to read (& so debug & maintain). " > I'm not a programmer, just a messenger - but it sounds as if there might be some scope for cross-fertilisation there.
>________________________________ >From: Oliver Bock <[email protected]> >To: David Anderson <[email protected]>; Rom Walton ><[email protected]> >Cc: boinc_dev <[email protected]> >Sent: Monday, 8 October 2012, 16:00 >Subject: [boinc_dev] GPU tasks still running while suspended > >Hi, > >We're getting reports that our CUDA app keeps running while BOINC shows >the task as suspended (because the computer is in use). Such a task's >elapsed time keeps increasing (in BOINC) and the process is indeed running. > >So far we know of two affected hosts, one Windows, one Linux (here: >7.0.31, compiled from source). > >Any idea what might cause this? Could the "leaving suspended >applications in memory" setting be related to this? > > >Cheers, >Oliver >_______________________________________________ >boinc_dev mailing list >[email protected] >http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >To unsubscribe, visit the above URL and >(near bottom of page) enter your email address. > > > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
