I think I heard that "beta" was turned off temporarily, for bandwidth reasons -- I suspect Eric's mental "bandwidth" as well as the wire and computing resources.
Seems like a good time for one less thing to juggle. Al Reust wrote: > The next part of the story... > > This morning as I look I have two Seti Beta downloads in Retry x:xx:xxx and > three download pending. Uploads are working fine for for both projects. > > Quick excerpt from the log. > > 7/23/2009 9:35:56 AM s...@home Beta Test Reporting 8 completed > tasks, requesting new tasks for GPU > 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 2742 bytes > 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 2872 bytes > 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 1436 bytes > 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 1436 bytes > 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 1436 bytes > 7/23/2009 9:35:58 AM [http_xfer_debug] HTTP: wrote 2412 bytes > 7/23/2009 9:36:01 AM s...@home Beta Test Scheduler request > completed: got 1 new tasks > 7/23/2009 9:48:04 AM s...@home Beta Test [error] File > 09mr09aa.11273.7434.3.13.192 has wrong size: expected 375335, got 0 > 7/23/2009 9:48:04 AM s...@home Beta Test Started download of > 09mr09aa.11273.7434.3.13.192 > 7/23/2009 9:48:04 AM s...@home Beta Test [file_xfer_debug] URL: > http://boinc2.ssl.berkeley.edu/beta/download/32c/09mr09aa.11273.7434.3.13.192 > 7/23/2009 9:48:04 AM s...@home Beta Test [error] File > 09mr09aa.11273.157778.3.13.66 has wrong size: expected 375335, got 0 > 7/23/2009 9:48:04 AM s...@home Beta Test Started download of > 09mr09aa.11273.157778.3.13.66 > 7/23/2009 9:48:04 AM s...@home Beta Test [file_xfer_debug] URL: > http://boinc2.ssl.berkeley.edu/beta/download/32a/09mr09aa.11273.157778.3.13.66 > 7/23/2009 9:48:04 AM [http_xfer_debug] HTTP: wrote 337 bytes > 7/23/2009 9:48:04 AM [http_xfer_debug] HTTP: wrote 336 bytes > 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval -184 > 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval -184 > 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] file > transfer status -184 > 7/23/2009 9:48:05 AM s...@home Beta Test Temporarily failed download > of 09mr09aa.11273.7434.3.13.192: HTTP error > 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] > project-wide xfer delay for 667.000336 sec > 7/23/2009 9:48:05 AM s...@home Beta Test Backing off 2 hr 57 min 43 > sec on download of 09mr09aa.11273.7434.3.13.192 > 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] file > transfer status -184 > 7/23/2009 9:48:05 AM s...@home Beta Test Temporarily failed download > of 09mr09aa.11273.157778.3.13.66: HTTP error > 7/23/2009 9:48:05 AM s...@home Beta Test [file_xfer_debug] > project-wide xfer delay for 3824.977569 sec > 7/23/2009 9:48:05 AM s...@home Beta Test Backing off 3 hr 19 min 25 > sec on download of 09mr09aa.11273.157778.3.13.66 > 7/23/2009 9:52:23 AM s...@home Beta Test Computation for task > 09mr09aa.11273.157778.3.13.89_1 finished > 7/23/2009 9:52:23 AM s...@home Beta Test Starting > 09mr09aa.11273.157778.3.13.77_0 > 7/23/2009 9:52:23 AM s...@home Beta Test Starting task > 09mr09aa.11273.157778.3.13.77_0 using setiathome_enhanced version 608 > 7/23/2009 9:52:25 AM s...@home Beta Test Started upload of > 09mr09aa.11273.157778.3.13.89_1_0 > 7/23/2009 9:52:25 AM s...@home Beta Test [file_xfer_debug] URL: > http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler > 7/23/2009 9:52:26 AM [http_xfer_debug] HTTP: wrote 93 bytes > 7/23/2009 9:52:27 AM s...@home Beta Test [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval 0 > 7/23/2009 9:52:27 AM s...@home Beta Test [file_xfer_debug] parsing > upload response: > <data_server_reply> <status>0</status> > <file_size>0</file_size></data_server_reply> > 7/23/2009 9:52:27 AM s...@home Beta Test [file_xfer_debug] parsing > status: 0 > 7/23/2009 9:52:28 AM [http_xfer_debug] HTTP: wrote 64 bytes > 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval 0 > 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] parsing > upload response: <data_server_reply> <status>0</status></data_server_reply> > 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] parsing > status: 0 > 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] file > transfer status 0 > 7/23/2009 9:52:29 AM s...@home Beta Test Finished upload of > 09mr09aa.11273.157778.3.13.89_1_0 > 7/23/2009 9:52:29 AM s...@home Beta Test [file_xfer_debug] > Throughput 20444 bytes/sec > 7/23/2009 10:09:21 AM nque...@home Project Computation for task > Nq26_06_20_23_15_09_0 finished > 7/23/2009 10:09:23 AM nque...@home Project Started upload of > Nq26_06_20_23_15_09_0_0 > 7/23/2009 10:09:23 AM nque...@home Project [file_xfer_debug] URL: > http://nqueens.ing.udec.cl/NQueens_cgi/file_upload_handler > 7/23/2009 10:09:25 AM [http_xfer_debug] HTTP: wrote 64 bytes > 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval 0 > 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] parsing > upload response: <data_server_reply> <status>0</status></data_server_reply> > 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] parsing > status: 0 > 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] file > transfer status 0 > 7/23/2009 10:09:25 AM nque...@home Project Finished upload of > Nq26_06_20_23_15_09_0_0 > 7/23/2009 10:09:25 AM nque...@home Project [file_xfer_debug] > Throughput 97 bytes/sec > 7/23/2009 10:15:54 AM s...@home Beta Test Computation for task > 09mr09aa.11273.157778.3.13.77_0 finished > 7/23/2009 10:15:54 AM s...@home Beta Test Starting > 09mr09aa.11273.157778.3.13.87_0 > 7/23/2009 10:15:54 AM s...@home Beta Test Starting task > 09mr09aa.11273.157778.3.13.87_0 using setiathome_enhanced version 608 > 7/23/2009 10:15:56 AM s...@home Beta Test Started upload of > 09mr09aa.11273.157778.3.13.77_0_0 > 7/23/2009 10:15:56 AM s...@home Beta Test [file_xfer_debug] URL: > http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler > 7/23/2009 10:15:57 AM [http_xfer_debug] HTTP: wrote 93 bytes > 7/23/2009 10:15:57 AM s...@home Beta Test [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval 0 > 7/23/2009 10:15:57 AM s...@home Beta Test [file_xfer_debug] parsing > upload response: > <data_server_reply> <status>0</status> > <file_size>0</file_size></data_server_reply> > 7/23/2009 10:15:57 AM s...@home Beta Test [file_xfer_debug] parsing > status: 0 > 7/23/2009 10:15:58 AM [http_xfer_debug] HTTP: wrote 64 bytes > 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] > FILE_XFER_SET::poll(): http op done; retval 0 > 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] parsing > upload response: <data_server_reply> <status>0</status></data_server_reply> > 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] parsing > status: 0 > 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] file > transfer status 0 > 7/23/2009 10:15:58 AM s...@home Beta Test Finished upload of > 09mr09aa.11273.157778.3.13.77_0_0 > 7/23/2009 10:15:58 AM s...@home Beta Test [file_xfer_debug] > Throughput 78530 bytes/sec > > > > > At 03:44 PM 7/22/2009 -0700, Lynn W. Taylor wrote: >> Sounds reasonable to me. Not sure if that is what was intended. >> >> Al Reust wrote: >>> Seti Beta - NQueens (running backup project) 6.6.38 Aries >>> http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789 >>> 47 Cuda stuck in "project backoff" >>> [s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of >>> 09mr09aa.11273.1299.3.13.117_1_0 >>> I could select one result and click Retry Now, it would pop 2 into >>> Uploading that errored out and extended the retry wait. >>> I clicked Retry Now again on the one below that, 2 popped into uploading >>> and one got through which started the next one in the Queue. >>> I clicked Retry Now again on the one below that, 2 popped into uploading >>> both timed out which extended the timer. >>> Okay what next??? >>> Do Network communications >>> The ones that had not been extended immediately went into Upload Pending. >>> It so happens it was coincident with Eric opening the pipe. The first two >>> got through okay, one of the next pair failed and went into extended >>> retry. The Next started uploading and got through. >>> It continued until about half got through and what was left were those >>> waiting for the extended retry. >>> Okay pick the top one and click Retry Now. One got through and the other >>> went into extended retry. >>> Then the next set of 2 both got through. >>> Do Network Communications >>> I presume that as the last 2 both were met with success, the uploads all >>> went to Upload Pending and started cycling through the remaining results. >>> Of the 27 remaining 2/3rds were successful and the remaining went into a >>> extended retry. >>> As Long as there was One Success after a failure it would proceed to the >>> next. IF there was no success (2 failures) it stopped. So a Host with a >>> very large number could be stuck until they get a clear connection. Then >>> they would still have those that had failed and extended the retry time. >>> >>> At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote: >>>> If there is a bug, it may be in how fast the project-wide delay grows. >>>> >>>> I've only had a couple of cycles and it's already 2.7 hours. >>>> >>>> If there are unwarranted bug reports, it is because uploads sit at >>>> "upload pending" and there is no indication in the GUI that there is a >>>> project wide delay, or when it will finally be lifted. >>>> >>>> -- Lynn >>>> >>>> David Anderson wrote: >>>>> I just checked in the change you describe. >>>>> Actually it was added 4 years ago, >>>>> but was backed out because there were reports that it wasn't working >>>> right. >>>>> This time I added some <file_xfer_debug>-enabled messages, >>>>> so we should be able to track down any problems. >>>>> >>>>> -- David >>>>> >>>>> Lynn W. Taylor wrote: >>>>>> Hi All, >>>>>> >>>>>> I've been watching s...@home during their current challenges, and I >>>>>> think I see something that can be optimized a bit. >>>>>> >>>>>> The problem: >>>>>> >>>>>> Take a very fast machine, something with lots of RAM, a couple of i7 >>>>>> processors and several high-end CUDA cards -- a machine that can chew >>>>>> through work units at an amazing rate. >>>>>> >>>>>> It has a big cache. >>>>>> >>>>>> As work is completed, each work unit goes into the transfer queue. >>>>>> >>>>>> BOINC sends each one, and if the upload server is unreachable, each work >>>>>> unit is retried based on the back-off algorithm. >>>>>> >>>>>> If an upload fails, that information does not affect the other running >>>>>> upload timers. >>>>>> >>>>>> In other words, this mega-fast machine could have a lot (hundreds) of >>>>>> pending uploads, and tries every one every few hours. >>>>>> >>>>>> I see two issues: >>>>>> >>>>>> 1) The most important work (the one with the earliest deadline) may be >>>>>> one of the ones that tries the least (longest interval). >>>>>> >>>>>> 2) Retrying 100's of units adds load to the servers. 180,000-odd >>>>>> clients trying to reach one or two machines at SETI. >>>>>> >>>>>> Optimization: >>>>>> >>>>>> On a failed upload, BOINC could basically treat that as if every upload >>>>>> timed out. That would reduce the number of attempted uploads from all >>>>>> clients, reducing the load on the servers. >>>>>> >>>>>> Of course, since the odds of a successful upload is just about zero for >>>>>> a work unit that isn't retried, by itself this is a bad idea. >>>>>> >>>>>> So, when any retry timer runs out, instead of retrying that WU, retry >>>>>> the one with the earliest deadline -- the one at the highest risk. >>>>>> >>>>>> As the load drops, work would continue to be uploaded in deadline order >>>>>> until everything is caught up. >>>>>> >>>>>> I know a project can have different upload servers for different >>>>>> applications, or for load balancing, or whatever, so this would only >>>>>> apply to work going to the same server. >>>>>> >>>>>> The same idea could apply to downloads as well. Does the BOINC client >>>>>> get the deadline from the scheduler?? >>>>>> >>>>>> Now, if I can figure out how to get a BOINC development environment >>>>>> going, and unless it's just a stupid idea, I'll be glad to take a shot >>>>>> at the code. >>>>>> >>>>>> Comments? >>>>>> >>>>>> -- Lynn >>>>>> _______________________________________________ >>>>>> boinc_dev mailing list >>>>>> [email protected] >>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>>>> To unsubscribe, visit the above URL and >>>>>> (near bottom of page) enter your email address. >>>>> _______________________________________________ >>>>> boinc_dev mailing list >>>>> [email protected] >>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>>> To unsubscribe, visit the above URL and >>>>> (near bottom of page) enter your email address. >>>>> >>>> _______________________________________________ >>>> boinc_dev mailing list >>>> [email protected] >>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>>> To unsubscribe, visit the above URL and >>>> (near bottom of page) enter your email address. > > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
