Re: [boinc_dev] Optimizing uploads.....

Lynn W. Taylor Wed, 22 Jul 2009 15:45:07 -0700

Sounds reasonable to me.  Not sure if that is what was intended.


Al Reust wrote:
> Seti Beta - NQueens (running backup project) 6.6.38 Aries
> http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789
> 47 Cuda stuck in "project backoff"
> [s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of 
> 09mr09aa.11273.1299.3.13.117_1_0
> 
> I could select one result and click Retry Now, it would pop 2 into 
> Uploading that errored out and extended the retry wait.
> I clicked Retry Now again on the one below that, 2 popped into uploading 
> and one got through which started the next one in the Queue.
> I clicked Retry Now again on the one below that, 2 popped into uploading 
> both timed out which extended the timer.
> 
> Okay what next???
> 
> Do Network communications
> The ones that had not been extended immediately went into Upload 
> Pending. It so happens it was coincident with Eric opening the pipe. The 
> first two got through okay, one of the next pair failed and went into 
> extended retry. The Next started uploading and got through.
> It continued until about half got through and what was left were those 
> waiting for the extended retry.
> 
> Okay pick the top one and click Retry Now. One got through and the other 
> went into extended retry.
> Then the next set of 2 both got through.
> 
> Do Network Communications
> I presume that as the last 2 both were met with success, the uploads all 
> went to Upload Pending and started cycling through the remaining results.
> Of the 27 remaining 2/3rds were successful and the remaining went into a 
> extended retry.
> 
> As Long as there was One Success after a failure it would proceed to the 
> next. IF there was no success (2 failures) it stopped. So a Host with a 
> very large number could be stuck until they get a clear connection. Then 
> they would still have those that had failed and extended the retry time.
> 
> 
> At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>> If there is a bug, it may be in how fast the project-wide delay grows.
>>
>> I've only had a couple of cycles and it's already 2.7 hours.
>>
>> If there are unwarranted bug reports, it is because uploads sit at
>> "upload pending" and there is no indication in the GUI that there is a
>> project wide delay, or when it will finally be lifted.
>>
>> -- Lynn
>>
>> David Anderson wrote:
>> > I just checked in the change you describe.
>> > Actually it was added 4 years ago,
>> > but was backed out because there were reports that it wasn't working 
>> right.
>> >
>> > This time I added some <file_xfer_debug>-enabled messages,
>> > so we should be able to track down any problems.
>> >
>> > -- David
>> >
>> > Lynn W. Taylor wrote:
>> >> Hi All,
>> >>
>> >> I've been watching s...@home during their current challenges, and I
>> >> think I see something that can be optimized a bit.
>> >>
>> >> The problem:
>> >>
>> >> Take a very fast machine, something with lots of RAM, a couple of i7
>> >> processors and several high-end CUDA cards -- a machine that can chew
>> >> through work units at an amazing rate.
>> >>
>> >> It has a big cache.
>> >>
>> >> As work is completed, each work unit goes into the transfer queue.
>> >>
>> >> BOINC sends each one, and if the upload server is unreachable, each 
>> work
>> >> unit is retried based on the back-off algorithm.
>> >>
>> >> If an upload fails, that information does not affect the other running
>> >> upload timers.
>> >>
>> >> In other words, this mega-fast machine could have a lot (hundreds) of
>> >> pending uploads, and tries every one every few hours.
>> >>
>> >> I see two issues:
>> >>
>> >> 1) The most important work (the one with the earliest deadline) may be
>> >> one of the ones that tries the least (longest interval).
>> >>
>> >> 2) Retrying 100's of units adds load to the servers.  180,000-odd
>> >> clients trying to reach one or two machines at SETI.
>> >>
>> >> Optimization:
>> >>
>> >> On a failed upload, BOINC could basically treat that as if every 
>> upload
>> >> timed out.  That would reduce the number of attempted uploads from all
>> >> clients, reducing the load on the servers.
>> >>
>> >> Of course, since the odds of a successful upload is just about zero 
>> for
>> >> a work unit that isn't retried, by itself this is a bad idea.
>> >>
>> >> So, when any retry timer runs out, instead of retrying that WU, retry
>> >> the one with the earliest deadline -- the one at the highest risk.
>> >>
>> >> As the load drops, work would continue to be uploaded in deadline 
>> order
>> >> until everything is caught up.
>> >>
>> >> I know a project can have different upload servers for different
>> >> applications, or for load balancing, or whatever, so this would only
>> >> apply to work going to the same server.
>> >>
>> >> The same idea could apply to downloads as well.  Does the BOINC client
>> >> get the deadline from the scheduler??
>> >>
>> >> Now, if I can figure out how to get a BOINC development environment
>> >> going, and unless it's just a stupid idea, I'll be glad to take a shot
>> >> at the code.
>> >>
>> >> Comments?
>> >>
>> >> -- Lynn
>> >> _______________________________________________
>> >> boinc_dev mailing list
>> >> [email protected]
>> >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> >> To unsubscribe, visit the above URL and
>> >> (near bottom of page) enter your email address.
>> >
>> > _______________________________________________
>> > boinc_dev mailing list
>> > [email protected]
>> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> > To unsubscribe, visit the above URL and
>> > (near bottom of page) enter your email address.
>> >
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
> 
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Optimizing uploads.....

Reply via email to