Currently BOINC tests GPU availability (each second?) preriodically and aborth 
GPU job when GPU becomes unavailable.
In particular, such GPU unavailability can occur at user switching in Windows 
2008 server (or, perhaps, other multi-use Windows configs past XP/2003).

My observation: if science app running offline (w/o BOINC) such user switch 
leads to just pause in computation, computation can be resumed after w/o any 
issues it seems.
Example:

 (it's some benchmark for tuning app in its parameter space):

------------
Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 5
with WU : Clean_20LC.wu
Started at : 12:44:23.307
Ended at : 13:13:56.840
1773.439 secs Elapsed
204.081 secs CPU time
------------
Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 6
with WU : Clean_20LC.wu
Started at : 13:14:00.006
Ended at : 13:43:21.590
1761.490 secs Elapsed
191.710 secs CPU time
------------
Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 7
with WU : Clean_20LC.wu
Started at : 13:43:24.772
Ended at : 13:25:01.975
85297.107 secs Elapsed
194.486 secs CPU time
------------
Running app : AP6_win_x86_SSE2_OpenCL_ATI_r1761.exe -unroll 8
with WU : Clean_20LC.wu
Started at : 13:25:05.113 Bolded part executed more than day (!).
And results:

 
rev 1761
13:24:59 (4596): called boinc_finish
[ /stderr ]

So, app finished after GPU re-connect to user session w/o any issues.

So, maybe job abortion by BOINC not so needed? Maybe it's worth to add more 
flexibility to this behavior and add some experimental option like "leave CPU 
app in memory" and call it "leaving GPU app in memory" ?
Surely app still can be aborted because of time limit exceeded, but in case of 
big penalty from checkpoint resume it could be viable to leave app in memory 
still.
 
 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to