Re: [boinc_dev] 6.6.36, host.active_frac = 100% means no work?

Richard Haselgrove Tue, 23 Jun 2009 14:32:25 -0700

I've had a search of my logs, and found a few similar instances:

19-Jun-2009 18:04:19 [s...@home] [sched_op_debug] CPU work request: 0.00 
seconds; 0 idle CPUs
19-Jun-2009 18:04:19 [s...@home] [sched_op_debug] CUDA work request: 130248.00 
seconds; 0 idle GPUs
19-Jun-2009 18:04:34 [s...@home] Scheduler request completed: got 0 new tasks
19-Jun-2009 18:04:34 [s...@home] [sched_op_debug] Server version 607
19-Jun-2009 18:04:34 [s...@home] Message from server: No work sent
19-Jun-2009 18:04:34 [s...@home] Message from server: No work is available for 
Astropulse v5
19-Jun-2009 18:04:34 [s...@home] Message from server: (won't finish in time) 
BOINC runs 99.9% of time, computation enabled 100.0% of that


They were all on 19 or 20 June, which was a period when:

a) SETI had recently suffered a server outage
b) I had completed all SETI CUDA work - cache empty
c) I attached to AQUA - received gross over-allocation of CUDA work: cache full.

BOINC continued to ask SETI for CUDA work, in a languorous, not-really-bothered 
sort of way.

It finally got:

21-Jun-2009 07:19:37 [s...@home] [wfd] request: CPU (0.00 sec, 0) CUDA 
(130248.00 sec, 0)
21-Jun-2009 07:19:37 [s...@home] [sched_op_debug] Starting scheduler request
21-Jun-2009 07:19:37 [s...@home] Sending scheduler request: To fetch work.
21-Jun-2009 07:19:37 [s...@home] Reporting 1 completed tasks, requesting new 
tasks for GPU
21-Jun-2009 07:19:37 [s...@home] [sched_op_debug] CPU work request: 0.00 
seconds; 0 idle CPUs
21-Jun-2009 07:19:37 [s...@home] [sched_op_debug] CUDA work request: 130248.00 
seconds; 0 idle GPUs
21-Jun-2009 07:19:42 [s...@home] Scheduler request completed: got 17 new tasks
21-Jun-2009 07:19:42 [s...@home] [sched_op_debug] Server version 607
21-Jun-2009 07:19:42 [s...@home] Project requested delay of 11 seconds
21-Jun-2009 07:19:42 [s...@home] [sched_op_debug] estimated total CPU job 
duration: 0 seconds
21-Jun-2009 07:19:42 [s...@home] [sched_op_debug] estimated total CUDA job 
duration: 18032 seconds

My hosts typically run at 99.99%+, but 100% would be unusual. So I agree with 
Jord: this points to a _server_ work allocation bug which issues a false 
"(won't finish in time)" message when the _client_ claims to be enabled for 
100.0% of the time (only possible if no time is set aside for benchmarks - 
could we be falling foul of an anti-cheat test?).

But like Jord, I don't have a captured request/reply pair to exhibit. Now that 
AQUA have cancelled their over-allocation, SETI/CUDA fetch will be resumed - 
I'll watch out for it happening again.


----- Original Message ----- 
From: "Jorden van der Elst" <[email protected]>
To: "BOINC Dev Mailing List" <[email protected]>
Sent: Tuesday, June 23, 2009 9:03 PM
Subject: [boinc_dev] 6.6.36, host.active_frac = 100% means no work?


> Is it possible that there's a bug in BOINC where when host.active_frac
> = 1 (100%) it won't request any work?
> I ask this as on DrugDiscovery we've seen the following happen to
> people (and to myself);
> 
> Host asks for work and gets as a message:
> 23-Jun-09 21:36:02 DrugDiscovery Message from server: GROMACS with
> Nvidia GPU is not available for your type of computer.
> 23-Jun-09 21:36:02 DrugDiscovery Message from server: (won't finish in
> time) BOINC runs 98.3% of time, computation enabled 100.0% of that
> 
> I don't think the GPU message has anything to do with it as that was
> something people with CUDA GPUs got as well. It's the second line.
> Now, stupid me, I didn't log anything at the time, or saved the
> relevant lines from client_state.xml or any of the sched_reply or
> sched_request files... I just reset the project and then got work.
> 
> After the last sched_request my numbers are:
>    <on_frac>0.974275</on_frac>
>    <connected_frac>0.967568</connected_frac>
>    <active_frac>0.968222</active_frac>
> 
> So I am assuming that any number under 100% will do, but 100% will
> stop work from getting in.
> Trouble is that I don't see it reflected in the source code. The only
> reference is when host.active_frac is above 100% (or > 1), to show a
> message that this is something of an impossibility and that we're
> resetting to 100% (or 1). ;-)
> 
> Any ideas?
> 
> -- 
> -- Jord.
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] 6.6.36, host.active_frac = 100% means no work?

Reply via email to