"rarely" is not "never" -- I don't know, I just read that it was temporarily.
That said, I don't see anything in the log that I see connected to the rest of the thread. I see three failures, and Dr. Anderson coded the project-wide backoff to start at four. -- Lynn Al Reust wrote: > Beta rarely is turned off... LOL > > Other than waiting for the Splitter changes for Enhanced new Workunits it > has flowed almost smoothly when everything else was chaos... > > Retry Now, caused three to download. > > So the Project backoff does work both ways. > > > At 10:43 AM 7/23/2009 -0700, Lynn W. Taylor wrote: >> I think I heard that "beta" was turned off temporarily, for bandwidth >> reasons -- I suspect Eric's mental "bandwidth" as well as the wire and >> computing resources. >> >> Seems like a good time for one less thing to juggle. >> >> Al Reust wrote: >>> The next part of the story... >>> This morning as I look I have two Seti Beta downloads in Retry x:xx:xxx >>> and three download pending. Uploads are working fine for for both projects. >>> Quick excerpt from the log. >>> 7/23/2009 9:35:56 AM s...@home Beta Test Reporting 8 completed >>> tasks, requesting new tasks for GPU >>> 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 2742 bytes >>> 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 2872 bytes >>> 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 1436 bytes >>> 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 1436 bytes >>> 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 1436 bytes >>> 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 2412 bytes >>> 7/23/2009 9:36:01 AM s...@home Beta Test Scheduler request >>> completed: got 1 new tasks >>> 7/23/2009 9:48:04 AM s...@home Beta Test [error] File >>> 09mr09aa.11273.7434.3.13.192 has wrong size: expected 375335, got 0 >>> 7/23/2009 9:48:04 AM s...@home Beta Test Started download of >>> 09mr09aa.11273.7434.3.13.192 >>> 7/23/2009 9:48:04 AM s...@home Beta Test [file_xfer_debug] URL: >>> http://boinc2.ssl.berkeley.edu/beta/download/32c/09mr09aa.11273.7434.3.13.192 >>> 7/23/2009 9:48:04 AM s...@home Beta Test [error] File >>> 09mr09aa.11273.157778.3.13.66 has wrong size: expected 375335, got 0 >>> 7/23/2009 9:48:04 AM s...@home Beta Test Started download of >>> 09mr09aa.11273.157778.3.13.66 >>> 7/23/2009 9:48:04 AM s...@home Beta Test [file_xfer_debug] URL: >>> http://boinc2.ssl.berkeley.edu/beta/download/32a/09mr09aa.11273.157778.3.13.66 >>> 7/23/2009 9:48:04 AM [http_xfer_debug] HTTP: wrote 337 bytes >>> 7/23/2009 9:48:04 AM [http_xfer_debug] HTTP: wrote 336 bytes >>> 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval -184 >>> 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval -184 >>> 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] file >>> transfer status -184 >>> 7/23/2009 9:48:05 AM s...@home Beta Test Temporarily failed >>> download of 09mr09aa.11273.7434.3.13.192: HTTP error >>> 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] >>> project-wide xfer delay for 667.000336 sec >>> 7/23/2009 9:48:05 AM s...@home Beta Test Backing off 2 hr 57 min >>> 43 sec on download of 09mr09aa.11273.7434.3.13.192 >>> 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] file >>> transfer status -184 >>> 7/23/2009 9:48:05 AM s...@home Beta Test Temporarily failed >>> download of 09mr09aa.11273.157778.3.13.66: HTTP error >>> 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] >>> project-wide xfer delay for 3824.977569 sec >>> 7/23/2009 9:48:05 AM s...@home Beta Test Backing off 3 hr 19 min >>> 25 sec on download of 09mr09aa.11273.157778.3.13.66 >>> 7/23/2009 9:52:23 AM s...@home Beta Test Computation for task >>> 09mr09aa.11273.157778.3.13.89_1 finished >>> 7/23/2009 9:52:23 AM s...@home Beta Test Starting >>> 09mr09aa.11273.157778.3.13.77_0 >>> 7/23/2009 9:52:23 AM s...@home Beta Test Starting task >>> 09mr09aa.11273.157778.3.13.77_0 using setiathome_enhanced version 608 >>> 7/23/2009 9:52:25 AM s...@home Beta Test Started upload of >>> 09mr09aa.11273.157778.3.13.89_1_0 >>> 7/23/2009 9:52:25 AM s...@home Beta Test [file_xfer_debug] URL: >>> http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler >>> 7/23/2009 9:52:26 AM [http_xfer_debug] HTTP: wrote 93 bytes >>> 7/23/2009 9:52:27 AM s...@home Beta Test [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval 0 >>> 7/23/2009 9:52:27 AM s...@home Beta Test [file_xfer_debug] parsing >>> upload response: <data_server_reply> <status>0</status> >>> <file_size>0</file_size></data_server_reply> >>> 7/23/2009 9:52:27 AM s...@home Beta Test [file_xfer_debug] parsing >>> status: 0 >>> 7/23/2009 9:52:28 AM [http_xfer_debug] HTTP: wrote 64 bytes >>> 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval 0 >>> 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] parsing >>> upload response: <data_server_reply> >>> <status>0</status></data_server_reply> >>> 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] parsing >>> status: 0 >>> 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] file >>> transfer status 0 >>> 7/23/2009 9:52:29 AM s...@home Beta Test Finished upload of >>> 09mr09aa.11273.157778.3.13.89_1_0 >>> 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] >>> Throughput 20444 bytes/sec >>> 7/23/2009 10:09:21 AM nque...@home Project Computation for task >>> Nq26_06_20_23_15_09_0 finished >>> 7/23/2009 10:09:23 AM nque...@home Project Started upload of >>> Nq26_06_20_23_15_09_0_0 >>> 7/23/2009 10:09:23 AM nque...@home Project [file_xfer_debug] URL: >>> http://nqueens.ing.udec.cl/NQueens_cgi/file_upload_handler >>> 7/23/2009 10:09:25 AM [http_xfer_debug] HTTP: wrote 64 bytes >>> 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval 0 >>> 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] parsing >>> upload response: <data_server_reply> >>> <status>0</status></data_server_reply> >>> 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] parsing >>> status: 0 >>> 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] file >>> transfer status 0 >>> 7/23/2009 10:09:25 AM nque...@home Project Finished upload of >>> Nq26_06_20_23_15_09_0_0 >>> 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] >>> Throughput 97 bytes/sec >>> 7/23/2009 10:15:54 AM s...@home Beta Test Computation for task >>> 09mr09aa.11273.157778.3.13.77_0 finished >>> 7/23/2009 10:15:54 AM s...@home Beta Test Starting >>> 09mr09aa.11273.157778.3.13.87_0 >>> 7/23/2009 10:15:54 AM s...@home Beta Test Starting task >>> 09mr09aa.11273.157778.3.13.87_0 using setiathome_enhanced version 608 >>> 7/23/2009 10:15:56 AM s...@home Beta Test Started upload of >>> 09mr09aa.11273.157778.3.13.77_0_0 >>> 7/23/2009 10:15:56 AM s...@home Beta Test [file_xfer_debug] URL: >>> http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler >>> 7/23/2009 10:15:57 AM [http_xfer_debug] HTTP: wrote 93 bytes >>> 7/23/2009 10:15:57 AM s...@home Beta Test [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval 0 >>> 7/23/2009 10:15:57 AM s...@home Beta Test [file_xfer_debug] parsing >>> upload response: <data_server_reply> <status>0</status> >>> <file_size>0</file_size></data_server_reply> >>> 7/23/2009 10:15:57 AM s...@home Beta Test [file_xfer_debug] parsing >>> status: 0 >>> 7/23/2009 10:15:58 AM [http_xfer_debug] HTTP: wrote 64 bytes >>> 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] >>> FILE_XFER_SET::poll(): http op done; retval 0 >>> 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] parsing >>> upload response: <data_server_reply> >>> <status>0</status></data_server_reply> >>> 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] parsing >>> status: 0 >>> 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] file >>> transfer status 0 >>> 7/23/2009 10:15:58 AM s...@home Beta Test Finished upload of >>> 09mr09aa.11273.157778.3.13.77_0_0 >>> 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] >>> Throughput 78530 bytes/sec >>> >>> >>> At 03:44 PM 7/22/2009 -0700, Lynn W. Taylor wrote: >>>> Sounds reasonable to me. Not sure if that is what was intended. >>>> >>>> Al Reust wrote: >>>>> Seti Beta - NQueens (running backup project) 6.6.38 Aries >>>>> http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789 >>>>> 47 Cuda stuck in "project backoff" >>>>> [s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of >>>>> 09mr09aa.11273.1299.3.13.117_1_0 >>>>> I could select one result and click Retry Now, it would pop 2 into >>>>> Uploading that errored out and extended the retry wait. >>>>> I clicked Retry Now again on the one below that, 2 popped into >>>>> uploading and one got through which started the next one in the Queue. >>>>> I clicked Retry Now again on the one below that, 2 popped into >>>>> uploading both timed out which extended the timer. >>>>> Okay what next??? >>>>> Do Network communications >>>>> The ones that had not been extended immediately went into Upload >>>>> Pending. It so happens it was coincident with Eric opening the pipe. >>>>> The first two got through okay, one of the next pair failed and went >>>>> into extended retry. The Next started uploading and got through. >>>>> It continued until about half got through and what was left were those >>>>> waiting for the extended retry. >>>>> Okay pick the top one and click Retry Now. One got through and the >>>>> other went into extended retry. >>>>> Then the next set of 2 both got through. >>>>> Do Network Communications >>>>> I presume that as the last 2 both were met with success, the uploads >>>>> all went to Upload Pending and started cycling through the remaining >>>>> results. >>>>> Of the 27 remaining 2/3rds were successful and the remaining went into >>>>> a extended retry. >>>>> As Long as there was One Success after a failure it would proceed to >>>>> the next. IF there was no success (2 failures) it stopped. So a Host >>>>> with a very large number could be stuck until they get a clear >>>>> connection. Then they would still have those that had failed and >>>>> extended the retry time. >>>>> >>>>> At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote: >>>>>> If there is a bug, it may be in how fast the project-wide delay grows. >>>>>> >>>>>> I've only had a couple of cycles and it's already 2.7 hours. >>>>>> >>>>>> If there are unwarranted bug reports, it is because uploads sit at >>>>>> "upload pending" and there is no indication in the GUI that there is a >>>>>> project wide delay, or when it will finally be lifted. >>>>>> >>>>>> -- Lynn >>>>>> >>>>>> David Anderson wrote: >>>>>>> I just checked in the change you describe. >>>>>>> Actually it was added 4 years ago, >>>>>>> but was backed out because there were reports that it wasn't working >>>>>> right. >>>>>>> This time I added some <file_xfer_debug>-enabled messages, >>>>>>> so we should be able to track down any problems. >>>>>>> >>>>>>> -- David >>>>>>> >>>>>>> Lynn W. Taylor wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I've been watching s...@home during their current challenges, and I >>>>>>>> think I see something that can be optimized a bit. >>>>>>>> >>>>>>>> The problem: >>>>>>>> >>>>>>>> Take a very fast machine, something with lots of RAM, a couple of i7 >>>>>>>> processors and several high-end CUDA cards -- a machine that can chew >>>>>>>> through work units at an amazing rate. >>>>>>>> >>>>>>>> It has a big cache. >>>>>>>> >>>>>>>> As work is completed, each work unit goes into the transfer queue. >>>>>>>> >>>>>>>> BOINC sends each one, and if the upload server is unreachable, each >>>>>>>> work >>>>>>>> unit is retried based on the back-off algorithm. >>>>>>>> >>>>>>>> If an upload fails, that information does not affect the other running >>>>>>>> upload timers. >>>>>>>> >>>>>>>> In other words, this mega-fast machine could have a lot (hundreds) of >>>>>>>> pending uploads, and tries every one every few hours. >>>>>>>> >>>>>>>> I see two issues: >>>>>>>> >>>>>>>> 1) The most important work (the one with the earliest deadline) may be >>>>>>>> one of the ones that tries the least (longest interval). >>>>>>>> >>>>>>>> 2) Retrying 100's of units adds load to the servers. 180,000-odd >>>>>>>> clients trying to reach one or two machines at SETI. >>>>>>>> >>>>>>>> Optimization: >>>>>>>> >>>>>>>> On a failed upload, BOINC could basically treat that as if every upload >>>>>>>> timed out. That would reduce the number of attempted uploads from all >>>>>>>> clients, reducing the load on the servers. >>>>>>>> >>>>>>>> Of course, since the odds of a successful upload is just about zero for >>>>>>>> a work unit that isn't retried, by itself this is a bad idea. >>>>>>>> >>>>>>>> So, when any retry timer runs out, instead of retrying that WU, retry >>>>>>>> the one with the earliest deadline -- the one at the highest risk. >>>>>>>> >>>>>>>> As the load drops, work would continue to be uploaded in deadline order >>>>>>>> until everything is caught up. >>>>>>>> >>>>>>>> I know a project can have different upload servers for different >>>>>>>> applications, or for load balancing, or whatever, so this would only >>>>>>>> apply to work going to the same server. >>>>>>>> >>>>>>>> The same idea could apply to downloads as well. Does the BOINC client >>>>>>>> get the deadline from the scheduler?? >>>>>>>> >>>>>>>> Now, if I can figure out how to get a BOINC development environment >>>>>>>> going, and unless it's just a stupid idea, I'll be glad to take a shot >>>>>>>> at the code. >>>>>>>> >>>>>>>> Comments? >>>>>>>> >>>>>>>> -- Lynn >>>>>>>> _______________________________________________ >>>>>>>> boinc_dev mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>>>>>> To unsubscribe, visit the above URL and >>>>>>>> (near bottom of page) enter your email address. >>>>>>> _______________________________________________ >>>>>>> boinc_dev mailing list >>>>>>> [email protected] >>>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>>>>> To unsubscribe, visit the above URL and >>>>>>> (near bottom of page) enter your email address. >>>>>> _______________________________________________ >>>>>> boinc_dev mailing list >>>>>> [email protected] >>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>>>> To unsubscribe, visit the above URL and >>>>>> (near bottom of page) enter your email address. >>> _______________________________________________ >>> boinc_dev mailing list >>> [email protected] >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>> To unsubscribe, visit the above URL and >>> (near bottom of page) enter your email address. > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
