What I did with one of the PrimeGrid GPU apps, when it gets any kind of GPU
error, is to go to sleep for 10 minutes and then try again.

The logic behind this behavior is that GPU errors will be one of two types;
either it's a transient instantaneous problem (like a driver crash), or
it's a longer lasting problem such as someone using remote desktop, which
makes the device unavailable to a BOINC app.

For the transient errors, the 10 minute delay usually is enough to allow
the app to run again.  (This app does a lot of crunching, so 10 minutes
isn't a significant portion of the run-time.)  For the longer errors,
hopefully the hour is enough, but if not, we haven't really wasted any GPU
ability because no other task could have been using the GPU anyway.

Mike

On Wed, May 22, 2013 at 1:12 PM, Eric J Korpela <[email protected]>wrote:

> I'm having some difficulty figuring out how to handle some hosts that I
> think are getting their GPU access cut off by remote desktop.
>
> Here's one such host...
>
> http://setiweb.ssl.berkeley.edu/beta/show_host_detail.php?hostid=62767
> He's using BOINC 7.0.64
>
> If you check out the tasks list you'll see two sets of GPU apps that are
> failing.  The ati_cal apps are failing with an error "-226
> (0xffffffffffffff1e) ERR_TOO_MANY_EXITS" when attempting to use the GPU.
> The opencl apps are exiting with "201 (0xc9) EXIT_MISSING_COPROC" which I
> assume is the appropriate exit code.
>
> In both cases, the max_jobs_per_day in host_app_version is getting
> decremented, so it will take many days for this host to recover should the
> GPU come back.
>
> Not much can be done about the version that returns the error except build
> a version that recognizes the GPU is gone, and exit properly.  But should
> the core client, rather than aborting a result that has an
> EXIT_MISSING_COPROC, just not attempt to run it again until a GPU is
> detected?
>
> Maybe this conversation has been had before and I missed it.
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to