Beta rarely is turned off... LOL

Other than waiting for the Splitter changes for Enhanced new Workunits it 
has flowed almost smoothly when everything else was chaos...

Retry Now, caused three to download.

So the Project backoff does work both ways.


At 10:43 AM 7/23/2009 -0700, Lynn W. Taylor wrote:
>I think I heard that "beta" was turned off temporarily, for bandwidth 
>reasons -- I suspect Eric's mental "bandwidth" as well as the wire and 
>computing resources.
>
>Seems like a good time for one less thing to juggle.
>
>Al Reust wrote:
>>The next part of the story...
>>This morning as I look I have two Seti Beta downloads in Retry x:xx:xxx 
>>and three download pending. Uploads are working fine for for both projects.
>>Quick excerpt from the log.
>>7/23/2009 9:35:56 AM    s...@home Beta Test     Reporting 8 completed 
>>tasks, requesting new tasks for GPU
>>7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2742 bytes
>>7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2872 bytes
>>7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
>>7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
>>7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
>>7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2412 bytes
>>7/23/2009 9:36:01 AM    s...@home Beta Test     Scheduler request 
>>completed: got 1 new tasks
>>7/23/2009 9:48:04 AM    s...@home Beta Test     [error] File 
>>09mr09aa.11273.7434.3.13.192 has wrong size: expected 375335, got 0
>>7/23/2009 9:48:04 AM    s...@home Beta Test     Started download of 
>>09mr09aa.11273.7434.3.13.192
>>7/23/2009 9:48:04 AM    s...@home Beta Test     [file_xfer_debug] URL: 
>>http://boinc2.ssl.berkeley.edu/beta/download/32c/09mr09aa.11273.7434.3.13.192
>>7/23/2009 9:48:04 AM    s...@home Beta Test     [error] File 
>>09mr09aa.11273.157778.3.13.66 has wrong size: expected 375335, got 0
>>7/23/2009 9:48:04 AM    s...@home Beta Test     Started download of 
>>09mr09aa.11273.157778.3.13.66
>>7/23/2009 9:48:04 AM    s...@home Beta Test     [file_xfer_debug] URL: 
>>http://boinc2.ssl.berkeley.edu/beta/download/32a/09mr09aa.11273.157778.3.13.66
>>7/23/2009 9:48:04 AM            [http_xfer_debug] HTTP: wrote 337 bytes
>>7/23/2009 9:48:04 AM            [http_xfer_debug] HTTP: wrote 336 bytes
>>7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval -184
>>7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval -184
>>7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] file 
>>transfer status -184
>>7/23/2009 9:48:05 AM    s...@home Beta Test     Temporarily failed 
>>download of 09mr09aa.11273.7434.3.13.192: HTTP error
>>7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>project-wide xfer delay for 667.000336 sec
>>7/23/2009 9:48:05 AM    s...@home Beta Test     Backing off 2 hr 57 min 
>>43 sec on download of 09mr09aa.11273.7434.3.13.192
>>7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] file 
>>transfer status -184
>>7/23/2009 9:48:05 AM    s...@home Beta Test     Temporarily failed 
>>download of 09mr09aa.11273.157778.3.13.66: HTTP error
>>7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>project-wide xfer delay for 3824.977569 sec
>>7/23/2009 9:48:05 AM    s...@home Beta Test     Backing off 3 hr 19 min 
>>25 sec on download of 09mr09aa.11273.157778.3.13.66
>>7/23/2009 9:52:23 AM    s...@home Beta Test     Computation for task 
>>09mr09aa.11273.157778.3.13.89_1 finished
>>7/23/2009 9:52:23 AM    s...@home Beta Test     Starting 
>>09mr09aa.11273.157778.3.13.77_0
>>7/23/2009 9:52:23 AM    s...@home Beta Test     Starting task 
>>09mr09aa.11273.157778.3.13.77_0 using setiathome_enhanced version 608
>>7/23/2009 9:52:25 AM    s...@home Beta Test     Started upload of 
>>09mr09aa.11273.157778.3.13.89_1_0
>>7/23/2009 9:52:25 AM    s...@home Beta Test     [file_xfer_debug] URL: 
>>http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler
>>7/23/2009 9:52:26 AM            [http_xfer_debug] HTTP: wrote 93 bytes
>>7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval 0
>>7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>upload response: <data_server_reply>    <status>0</status> 
>><file_size>0</file_size></data_server_reply>
>>7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>status: 0
>>7/23/2009 9:52:28 AM            [http_xfer_debug] HTTP: wrote 64 bytes
>>7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval 0
>>7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>upload response: <data_server_reply>    <status>0</status></data_server_reply>
>>7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>status: 0
>>7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] file 
>>transfer status 0
>>7/23/2009 9:52:29 AM    s...@home Beta Test     Finished upload of 
>>09mr09aa.11273.157778.3.13.89_1_0
>>7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] 
>>Throughput 20444 bytes/sec
>>7/23/2009 10:09:21 AM   nque...@home Project    Computation for task 
>>Nq26_06_20_23_15_09_0 finished
>>7/23/2009 10:09:23 AM   nque...@home Project    Started upload of 
>>Nq26_06_20_23_15_09_0_0
>>7/23/2009 10:09:23 AM   nque...@home Project    [file_xfer_debug] URL: 
>>http://nqueens.ing.udec.cl/NQueens_cgi/file_upload_handler
>>7/23/2009 10:09:25 AM           [http_xfer_debug] HTTP: wrote 64 bytes
>>7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval 0
>>7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] parsing 
>>upload response: <data_server_reply>    <status>0</status></data_server_reply>
>>7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] parsing 
>>status: 0
>>7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] file 
>>transfer status 0
>>7/23/2009 10:09:25 AM   nque...@home Project    Finished upload of 
>>Nq26_06_20_23_15_09_0_0
>>7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] 
>>Throughput 97 bytes/sec
>>7/23/2009 10:15:54 AM   s...@home Beta Test     Computation for task 
>>09mr09aa.11273.157778.3.13.77_0 finished
>>7/23/2009 10:15:54 AM   s...@home Beta Test     Starting 
>>09mr09aa.11273.157778.3.13.87_0
>>7/23/2009 10:15:54 AM   s...@home Beta Test     Starting task 
>>09mr09aa.11273.157778.3.13.87_0 using setiathome_enhanced version 608
>>7/23/2009 10:15:56 AM   s...@home Beta Test     Started upload of 
>>09mr09aa.11273.157778.3.13.77_0_0
>>7/23/2009 10:15:56 AM   s...@home Beta Test     [file_xfer_debug] URL: 
>>http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler
>>7/23/2009 10:15:57 AM           [http_xfer_debug] HTTP: wrote 93 bytes
>>7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval 0
>>7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>upload response: <data_server_reply>    <status>0</status> 
>><file_size>0</file_size></data_server_reply>
>>7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>status: 0
>>7/23/2009 10:15:58 AM           [http_xfer_debug] HTTP: wrote 64 bytes
>>7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] 
>>FILE_XFER_SET::poll(): http op done; retval 0
>>7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>upload response: <data_server_reply>    <status>0</status></data_server_reply>
>>7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>status: 0
>>7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] file 
>>transfer status 0
>>7/23/2009 10:15:58 AM   s...@home Beta Test     Finished upload of 
>>09mr09aa.11273.157778.3.13.77_0_0
>>7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] 
>>Throughput 78530 bytes/sec
>>
>>
>>At 03:44 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>>>Sounds reasonable to me.  Not sure if that is what was intended.
>>>
>>>Al Reust wrote:
>>>>Seti Beta - NQueens (running backup project) 6.6.38 Aries
>>>>http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789
>>>>47 Cuda stuck in "project backoff"
>>>>[s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of 
>>>>09mr09aa.11273.1299.3.13.117_1_0
>>>>I could select one result and click Retry Now, it would pop 2 into 
>>>>Uploading that errored out and extended the retry wait.
>>>>I clicked Retry Now again on the one below that, 2 popped into 
>>>>uploading and one got through which started the next one in the Queue.
>>>>I clicked Retry Now again on the one below that, 2 popped into 
>>>>uploading both timed out which extended the timer.
>>>>Okay what next???
>>>>Do Network communications
>>>>The ones that had not been extended immediately went into Upload 
>>>>Pending. It so happens it was coincident with Eric opening the pipe. 
>>>>The first two got through okay, one of the next pair failed and went 
>>>>into extended retry. The Next started uploading and got through.
>>>>It continued until about half got through and what was left were those 
>>>>waiting for the extended retry.
>>>>Okay pick the top one and click Retry Now. One got through and the 
>>>>other went into extended retry.
>>>>Then the next set of 2 both got through.
>>>>Do Network Communications
>>>>I presume that as the last 2 both were met with success, the uploads 
>>>>all went to Upload Pending and started cycling through the remaining 
>>>>results.
>>>>Of the 27 remaining 2/3rds were successful and the remaining went into 
>>>>a extended retry.
>>>>As Long as there was One Success after a failure it would proceed to 
>>>>the next. IF there was no success (2 failures) it stopped. So a Host 
>>>>with a very large number could be stuck until they get a clear 
>>>>connection. Then they would still have those that had failed and 
>>>>extended the retry time.
>>>>
>>>>At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>>>>>If there is a bug, it may be in how fast the project-wide delay grows.
>>>>>
>>>>>I've only had a couple of cycles and it's already 2.7 hours.
>>>>>
>>>>>If there are unwarranted bug reports, it is because uploads sit at
>>>>>"upload pending" and there is no indication in the GUI that there is a
>>>>>project wide delay, or when it will finally be lifted.
>>>>>
>>>>>-- Lynn
>>>>>
>>>>>David Anderson wrote:
>>>>>>I just checked in the change you describe.
>>>>>>Actually it was added 4 years ago,
>>>>>>but was backed out because there were reports that it wasn't working
>>>>>right.
>>>>>>This time I added some <file_xfer_debug>-enabled messages,
>>>>>>so we should be able to track down any problems.
>>>>>>
>>>>>>-- David
>>>>>>
>>>>>>Lynn W. Taylor wrote:
>>>>>>>Hi All,
>>>>>>>
>>>>>>>I've been watching s...@home during their current challenges, and I
>>>>>>>think I see something that can be optimized a bit.
>>>>>>>
>>>>>>>The problem:
>>>>>>>
>>>>>>>Take a very fast machine, something with lots of RAM, a couple of i7
>>>>>>>processors and several high-end CUDA cards -- a machine that can chew
>>>>>>>through work units at an amazing rate.
>>>>>>>
>>>>>>>It has a big cache.
>>>>>>>
>>>>>>>As work is completed, each work unit goes into the transfer queue.
>>>>>>>
>>>>>>>BOINC sends each one, and if the upload server is unreachable, each work
>>>>>>>unit is retried based on the back-off algorithm.
>>>>>>>
>>>>>>>If an upload fails, that information does not affect the other running
>>>>>>>upload timers.
>>>>>>>
>>>>>>>In other words, this mega-fast machine could have a lot (hundreds) of
>>>>>>>pending uploads, and tries every one every few hours.
>>>>>>>
>>>>>>>I see two issues:
>>>>>>>
>>>>>>>1) The most important work (the one with the earliest deadline) may be
>>>>>>>one of the ones that tries the least (longest interval).
>>>>>>>
>>>>>>>2) Retrying 100's of units adds load to the servers.  180,000-odd
>>>>>>>clients trying to reach one or two machines at SETI.
>>>>>>>
>>>>>>>Optimization:
>>>>>>>
>>>>>>>On a failed upload, BOINC could basically treat that as if every upload
>>>>>>>timed out.  That would reduce the number of attempted uploads from all
>>>>>>>clients, reducing the load on the servers.
>>>>>>>
>>>>>>>Of course, since the odds of a successful upload is just about zero for
>>>>>>>a work unit that isn't retried, by itself this is a bad idea.
>>>>>>>
>>>>>>>So, when any retry timer runs out, instead of retrying that WU, retry
>>>>>>>the one with the earliest deadline -- the one at the highest risk.
>>>>>>>
>>>>>>>As the load drops, work would continue to be uploaded in deadline order
>>>>>>>until everything is caught up.
>>>>>>>
>>>>>>>I know a project can have different upload servers for different
>>>>>>>applications, or for load balancing, or whatever, so this would only
>>>>>>>apply to work going to the same server.
>>>>>>>
>>>>>>>The same idea could apply to downloads as well.  Does the BOINC client
>>>>>>>get the deadline from the scheduler??
>>>>>>>
>>>>>>>Now, if I can figure out how to get a BOINC development environment
>>>>>>>going, and unless it's just a stupid idea, I'll be glad to take a shot
>>>>>>>at the code.
>>>>>>>
>>>>>>>Comments?
>>>>>>>
>>>>>>>-- Lynn
>>>>>>>_______________________________________________
>>>>>>>boinc_dev mailing list
>>>>>>>[email protected]
>>>>>>>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>>>To unsubscribe, visit the above URL and
>>>>>>>(near bottom of page) enter your email address.
>>>>>>_______________________________________________
>>>>>>boinc_dev mailing list
>>>>>>[email protected]
>>>>>>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>>To unsubscribe, visit the above URL and
>>>>>>(near bottom of page) enter your email address.
>>>>>_______________________________________________
>>>>>boinc_dev mailing list
>>>>>[email protected]
>>>>>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>To unsubscribe, visit the above URL and
>>>>>(near bottom of page) enter your email address.
>>_______________________________________________
>>boinc_dev mailing list
>>[email protected]
>>http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>To unsubscribe, visit the above URL and
>>(near bottom of page) enter your email address.

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to