Perhaps the solution is to default the new column to 10 for all hosts
during the conversion, but not for new hosts. This would avoid the problem
with getting stuck like this when all tasks are high priority, and the DB
would get straightened out shortly in any case. New clients after the
conversion should have that entry created as 0, not 10.
jm7
Slawomir
Rzeznicki
<t...@enigmaathome To
.net> "David Anderson"
Sent by: <[email protected]>,
<boinc_dev-bounce "BOINC_dev"
[email protected] <[email protected]>
u> cc
Subject
10/24/2010 05:09 Re: [boinc_dev] [rev 22566]
PM scheduler doesn't work at all
I found the source of the problem.
When I ran upgrade, a fresh table host_app_version was created (I upgraded
from much older revision) and all hosts had their consecutive_valid set to
0.
This was just after a high priority batch of tasks entered the server's
queue, so almost everything in the feeder queue was high priority.
Any task with priority higher than 0 goes to reliable hosts, since there
were no reliable hosts (consecutive_valid >=10 required) the queue got
stuck.
That's why I thought that a bit older scheduler (22488) worked, when I
tested it the server had a small number of low priority tasks still
available.
/TJM
http://www.enigmaathome.net
On Sat, 23 Oct 2010 10:39:37 -0700
David Anderson <[email protected]> wrote:
>I'm not seeing any problem with the scheduler in
>the latest trunk revision (22593). Try that.
>-- David
>
>On 23-Oct-2010 8:43 AM, Slawomir Rzeznicki wrote:
>> Hello,
>>
>> Recently I upgraded my server to revision 22566 which
>>seems to have seriously bugged scheduler.
>> I enabled most (if not all) of the sched debug logs - I
>>can't find anything unusual there, however it replies
>>with no work available to all work requests, just like it
>>would with empty feeder queue.
>> I've checked the feeder already and it seems to work
>>fine, the queue is filled with workunits and so is shmem.
>> Also, various people reported that sched does not accept
>>any work reported back to the server, however I can't
>>confirm it right now because I haven't seen any logs yet
>>and I don't have tasks left to report myself.
>>
>> I'd appreciate any suggestions on how to debug this
>>further, right now the only thing I know that the bug
>>must be somewhere between changesets 22488 and 22566,
>>because 22488 works for sure.
>> Below is a sample from sched log after I did a request
>>from one of my PCs.
>>
>> 2010-10-23 10:08:05.6313 [PID=7397 ] Request: [USER#1]
>>[HOST#3757] [IP 69.12.216.209] client 6.12.4
>> 2010-10-23 10:08:05.6319 [PID=7397 ] [send] Not using
>>matchmaker scheduling; Not using EDF sim
>> 2010-10-23 10:08:05.6320 [PID=7397 ] [send] CPU: req
>>97397.08 sec, 2.00 instances; est delay 0.00
>> 2010-10-23 10:08:05.6320 [PID=7397 ] [send] CUDA: req
>>0.00 sec, 0.00 instances; est delay 0.00
>> 2010-10-23 10:08:05.6320 [PID=7397 ] [send]
>>work_req_seconds: 97397.08 secs
>> 2010-10-23 10:08:05.6320 [PID=7397 ] [send] available
>>disk 54.25 GB, work_buf_min 0
>> 2010-10-23 10:08:05.6320 [PID=7397 ] [send]
>>active_frac 0.999270 on_frac 0.992676
>> 2010-10-23 10:08:05.6320 [PID=7397 ] Anonymous
>>platform app versions:
>> 2010-10-23 10:08:05.6320 [PID=7397 ] app:
>>enigma_m4_2 version 522 cpus 1.00 cudas 0.00 atis 0.00
>>flops 3.482478G
>> 2010-10-23 10:08:05.6324 [PID=7397 ] [send]
>>[AV#6000002] not reliable; cons valid 0< 10
>> 2010-10-23 10:08:05.6324 [PID=7397 ] [send]
>>set_trust: cons valid 0< 10, don't use single
>>replication
>> 2010-10-23 10:08:05.6525 [PID=7397 ] Sending reply to
>>[HOST#3757]: 0 results, delay req 181.80
>> 2010-10-23 10:08:05.6528 [PID=7397 ] Scheduler ran
>>0.026 seconds
>>
>> /TJM
>> http://www.enigmaathome.net
>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>_______________________________________________
>boinc_dev mailing list
>[email protected]
>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>To unsubscribe, visit the above URL and
>(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.