Currently BOINC tests GPU availability (each second?) preriodically and aborth GPU job when GPU becomes unavailable. In particular, such GPU unavailability can occur at user switching in Windows 2008 server (or, perhaps, other multi-use Windows configs past XP/2003).
My observation: if science app running offline (w/o BOINC) such user switch leads to just pause in computation, computation can be resumed after w/o any issues it seems. Example: (it's some benchmark for tuning app in its parameter space): ------------ Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 5 with WU : Clean_20LC.wu Started at : 12:44:23.307 Ended at : 13:13:56.840 1773.439 secs Elapsed 204.081 secs CPU time ------------ Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 6 with WU : Clean_20LC.wu Started at : 13:14:00.006 Ended at : 13:43:21.590 1761.490 secs Elapsed 191.710 secs CPU time ------------ Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 7 with WU : Clean_20LC.wu Started at : 13:43:24.772 Ended at : 13:25:01.975 85297.107 secs Elapsed 194.486 secs CPU time ------------ Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 8 with WU : Clean_20LC.wu Started at : 13:25:05.113 Bolded part executed more than day (!). And results: rev 1761 13:24:59 (4596): called boinc_finish [ /stderr ] So, app finished after GPU re-connect to user session w/o any issues. So, maybe job abortion by BOINC not so needed? Maybe it's worth to add more flexibility to this behavior and add some experimental option like "leave CPU app in memory" and call it "leaving GPU app in memory" ? Surely app still can be aborted because of time limit exceeded, but in case of big penalty from checkpoint resume it could be viable to leave app in memory still. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
