Gabor Gombas wrote, On 23.07.2009 11:56 Uhr: > On Thu, Jul 23, 2009 at 10:18:33AM +0200, Oliver Bock wrote: > > >> But that's not all: have a look in client/app_control.cpp at >> ACTIVE_TASK_SET::exit_tasks() and ACTIVE_TASK::kill_task(). The app is >> killed >> five seconds after a normal shutdown was initiated. If the app fails to >> shutdown itself, the client then "kills" the app the hard way and the >> described problem might still occurr... >> > > The real solution is to bug Nvidia to fix the CUDA framework so a > crashing/disappearing host application can't cause the GPU to crash/lock > up.
I'm not sure that this would be feasible. The only way I can imagine that would ensure a proper cleanup would be to manage the GPU device memory in the process management of the operating system, which I'm not sure is possible to do within e.g. a device driver. > In the mean time, write the controlling application carefully so it > can react to SIGTERM etc. in a timely manner. > Sure. But the problem I see with the current BOINC Client is that an App may not immediately (i.e. within 5 seconds) react to a quit message for a number of reasons (including bugs in the BOINC code) and in particular may not be as responsive when it is nice'd. Sending a non-catchable signal 5 seconds after a message is a bit too harsh and dangerous for GPU Apps under current conditions. I'd rather stretch the escalation for GPU Applications, e.g. - send quit message - wait 10 seconds - send a catchable signal (HUP, TERM, QUIT) that can be dealt with in a signal handler - wait 20 seconds - send a SIGKILL to free at least the CPU If the SIGKILL actually kills a process that's still there, I'd notify either the project or the user, because that means that something is wrong with the App or the machine; maybe mark the current task as Client Error (because a note on the screen is useless if the graphics is corrupted). In any case I think the current behavior of exit_tasks() (SIGKILL 5 secs after a message) is too 'dangerous' for systems running current CUDA Apps. Best, Bernd _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
