If there is a bug, it may be in how fast the project-wide delay grows.

I've only had a couple of cycles and it's already 2.7 hours.

If there are unwarranted bug reports, it is because uploads sit at 
"upload pending" and there is no indication in the GUI that there is a 
project wide delay, or when it will finally be lifted.

-- Lynn

David Anderson wrote:
> I just checked in the change you describe.
> Actually it was added 4 years ago,
> but was backed out because there were reports that it wasn't working right.
> 
> This time I added some <file_xfer_debug>-enabled messages,
> so we should be able to track down any problems.
> 
> -- David
> 
> Lynn W. Taylor wrote:
>> Hi All,
>>
>> I've been watching s...@home during their current challenges, and I 
>> think I see something that can be optimized a bit.
>>
>> The problem:
>>
>> Take a very fast machine, something with lots of RAM, a couple of i7 
>> processors and several high-end CUDA cards -- a machine that can chew 
>> through work units at an amazing rate.
>>
>> It has a big cache.
>>
>> As work is completed, each work unit goes into the transfer queue.
>>
>> BOINC sends each one, and if the upload server is unreachable, each work 
>> unit is retried based on the back-off algorithm.
>>
>> If an upload fails, that information does not affect the other running 
>> upload timers.
>>
>> In other words, this mega-fast machine could have a lot (hundreds) of 
>> pending uploads, and tries every one every few hours.
>>
>> I see two issues:
>>
>> 1) The most important work (the one with the earliest deadline) may be 
>> one of the ones that tries the least (longest interval).
>>
>> 2) Retrying 100's of units adds load to the servers.  180,000-odd 
>> clients trying to reach one or two machines at SETI.
>>
>> Optimization:
>>
>> On a failed upload, BOINC could basically treat that as if every upload 
>> timed out.  That would reduce the number of attempted uploads from all 
>> clients, reducing the load on the servers.
>>
>> Of course, since the odds of a successful upload is just about zero for 
>> a work unit that isn't retried, by itself this is a bad idea.
>>
>> So, when any retry timer runs out, instead of retrying that WU, retry 
>> the one with the earliest deadline -- the one at the highest risk.
>>
>> As the load drops, work would continue to be uploaded in deadline order 
>> until everything is caught up.
>>
>> I know a project can have different upload servers for different 
>> applications, or for load balancing, or whatever, so this would only 
>> apply to work going to the same server.
>>
>> The same idea could apply to downloads as well.  Does the BOINC client 
>> get the deadline from the scheduler??
>>
>> Now, if I can figure out how to get a BOINC development environment 
>> going, and unless it's just a stupid idea, I'll be glad to take a shot 
>> at the code.
>>
>> Comments?
>>
>> -- Lynn
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
> 
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to