Sounds reasonable to me. Not sure if that is what was intended.
Al Reust wrote: > Seti Beta - NQueens (running backup project) 6.6.38 Aries > http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789 > 47 Cuda stuck in "project backoff" > [s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of > 09mr09aa.11273.1299.3.13.117_1_0 > > I could select one result and click Retry Now, it would pop 2 into > Uploading that errored out and extended the retry wait. > I clicked Retry Now again on the one below that, 2 popped into uploading > and one got through which started the next one in the Queue. > I clicked Retry Now again on the one below that, 2 popped into uploading > both timed out which extended the timer. > > Okay what next??? > > Do Network communications > The ones that had not been extended immediately went into Upload > Pending. It so happens it was coincident with Eric opening the pipe. The > first two got through okay, one of the next pair failed and went into > extended retry. The Next started uploading and got through. > It continued until about half got through and what was left were those > waiting for the extended retry. > > Okay pick the top one and click Retry Now. One got through and the other > went into extended retry. > Then the next set of 2 both got through. > > Do Network Communications > I presume that as the last 2 both were met with success, the uploads all > went to Upload Pending and started cycling through the remaining results. > Of the 27 remaining 2/3rds were successful and the remaining went into a > extended retry. > > As Long as there was One Success after a failure it would proceed to the > next. IF there was no success (2 failures) it stopped. So a Host with a > very large number could be stuck until they get a clear connection. Then > they would still have those that had failed and extended the retry time. > > > At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote: >> If there is a bug, it may be in how fast the project-wide delay grows. >> >> I've only had a couple of cycles and it's already 2.7 hours. >> >> If there are unwarranted bug reports, it is because uploads sit at >> "upload pending" and there is no indication in the GUI that there is a >> project wide delay, or when it will finally be lifted. >> >> -- Lynn >> >> David Anderson wrote: >> > I just checked in the change you describe. >> > Actually it was added 4 years ago, >> > but was backed out because there were reports that it wasn't working >> right. >> > >> > This time I added some <file_xfer_debug>-enabled messages, >> > so we should be able to track down any problems. >> > >> > -- David >> > >> > Lynn W. Taylor wrote: >> >> Hi All, >> >> >> >> I've been watching s...@home during their current challenges, and I >> >> think I see something that can be optimized a bit. >> >> >> >> The problem: >> >> >> >> Take a very fast machine, something with lots of RAM, a couple of i7 >> >> processors and several high-end CUDA cards -- a machine that can chew >> >> through work units at an amazing rate. >> >> >> >> It has a big cache. >> >> >> >> As work is completed, each work unit goes into the transfer queue. >> >> >> >> BOINC sends each one, and if the upload server is unreachable, each >> work >> >> unit is retried based on the back-off algorithm. >> >> >> >> If an upload fails, that information does not affect the other running >> >> upload timers. >> >> >> >> In other words, this mega-fast machine could have a lot (hundreds) of >> >> pending uploads, and tries every one every few hours. >> >> >> >> I see two issues: >> >> >> >> 1) The most important work (the one with the earliest deadline) may be >> >> one of the ones that tries the least (longest interval). >> >> >> >> 2) Retrying 100's of units adds load to the servers. 180,000-odd >> >> clients trying to reach one or two machines at SETI. >> >> >> >> Optimization: >> >> >> >> On a failed upload, BOINC could basically treat that as if every >> upload >> >> timed out. That would reduce the number of attempted uploads from all >> >> clients, reducing the load on the servers. >> >> >> >> Of course, since the odds of a successful upload is just about zero >> for >> >> a work unit that isn't retried, by itself this is a bad idea. >> >> >> >> So, when any retry timer runs out, instead of retrying that WU, retry >> >> the one with the earliest deadline -- the one at the highest risk. >> >> >> >> As the load drops, work would continue to be uploaded in deadline >> order >> >> until everything is caught up. >> >> >> >> I know a project can have different upload servers for different >> >> applications, or for load balancing, or whatever, so this would only >> >> apply to work going to the same server. >> >> >> >> The same idea could apply to downloads as well. Does the BOINC client >> >> get the deadline from the scheduler?? >> >> >> >> Now, if I can figure out how to get a BOINC development environment >> >> going, and unless it's just a stupid idea, I'll be glad to take a shot >> >> at the code. >> >> >> >> Comments? >> >> >> >> -- Lynn >> >> _______________________________________________ >> >> boinc_dev mailing list >> >> [email protected] >> >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> >> To unsubscribe, visit the above URL and >> >> (near bottom of page) enter your email address. >> > >> > _______________________________________________ >> > boinc_dev mailing list >> > [email protected] >> > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> > To unsubscribe, visit the above URL and >> > (near bottom of page) enter your email address. >> > >> _______________________________________________ >> boinc_dev mailing list >> [email protected] >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. > > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
