I found the source of the problem.

When I ran upgrade, a fresh table host_app_version was created (I upgraded from 
much older revision) and all hosts had their consecutive_valid set to 0.
This was just after a high priority batch of tasks entered the server's queue, 
so almost everything in the feeder queue was high priority.
Any task with priority higher than 0 goes to reliable hosts, since there were 
no reliable hosts (consecutive_valid >=10 required) the queue got stuck.
That's why I thought that a bit older scheduler (22488) worked, when I tested 
it the server had a small number of low priority tasks still available. 

/TJM
http://www.enigmaathome.net


On Sat, 23 Oct 2010 10:39:37 -0700
 David Anderson <[email protected]> wrote:
>I'm not seeing any problem with the scheduler in
>the latest trunk revision (22593).  Try that.
>-- David
>
>On 23-Oct-2010 8:43 AM, Slawomir Rzeznicki wrote:
>> Hello,
>>
>> Recently I upgraded my server to revision 22566 which 
>>seems to have seriously bugged scheduler.
>> I enabled most (if not all) of the sched debug logs - I 
>>can't find anything unusual there, however it replies 
>>with no work available to all work requests, just like it 
>>would with empty feeder queue.
>> I've checked the feeder already and it seems to work 
>>fine, the queue is filled with workunits and so is shmem.
>> Also, various people reported that sched does not accept 
>>any work reported back to the server, however I can't 
>>confirm it right now because I haven't seen any logs yet 
>>and I don't have tasks left to report myself.
>>
>> I'd appreciate any suggestions on how to debug this 
>>further, right now the only thing I know that the bug 
>>must be somewhere between changesets 22488 and 22566, 
>>because 22488 works for sure.
>> Below is a sample from sched log after I did a request 
>>from one of my PCs.
>>
>> 2010-10-23 10:08:05.6313 [PID=7397 ]   Request: [USER#1] 
>>[HOST#3757] [IP 69.12.216.209] client 6.12.4
>> 2010-10-23 10:08:05.6319 [PID=7397 ]    [send] Not using 
>>matchmaker scheduling; Not using EDF sim
>> 2010-10-23 10:08:05.6320 [PID=7397 ]    [send] CPU: req 
>>97397.08 sec, 2.00 instances; est delay 0.00
>> 2010-10-23 10:08:05.6320 [PID=7397 ]    [send] CUDA: req 
>>0.00 sec, 0.00 instances; est delay 0.00
>> 2010-10-23 10:08:05.6320 [PID=7397 ]    [send] 
>>work_req_seconds: 97397.08 secs
>> 2010-10-23 10:08:05.6320 [PID=7397 ]    [send] available 
>>disk 54.25 GB, work_buf_min 0
>> 2010-10-23 10:08:05.6320 [PID=7397 ]    [send] 
>>active_frac 0.999270 on_frac 0.992676
>> 2010-10-23 10:08:05.6320 [PID=7397 ]    Anonymous 
>>platform app versions:
>> 2010-10-23 10:08:05.6320 [PID=7397 ]       app: 
>>enigma_m4_2 version 522 cpus 1.00 cudas 0.00 atis 0.00 
>>flops 3.482478G
>> 2010-10-23 10:08:05.6324 [PID=7397 ]    [send] 
>>[AV#6000002] not reliable; cons valid 0<  10
>> 2010-10-23 10:08:05.6324 [PID=7397 ]    [send] 
>>set_trust: cons valid 0<  10, don't use single 
>>replication
>> 2010-10-23 10:08:05.6525 [PID=7397 ]    Sending reply to 
>>[HOST#3757]: 0 results, delay req 181.80
>> 2010-10-23 10:08:05.6528 [PID=7397 ]    Scheduler ran 
>>0.026 seconds
>>
>> /TJM
>> http://www.enigmaathome.net
>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>_______________________________________________
>boinc_dev mailing list
>[email protected]
>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>To unsubscribe, visit the above URL and
>(near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to