"rarely" is not "never" -- I don't know, I just read that it was 
temporarily.

That said, I don't see anything in the log that I see connected to the 
rest of the thread.  I see three failures, and Dr. Anderson coded the 
project-wide backoff to start at four.

-- Lynn

Al Reust wrote:
> Beta rarely is turned off... LOL
> 
> Other than waiting for the Splitter changes for Enhanced new Workunits it 
> has flowed almost smoothly when everything else was chaos...
> 
> Retry Now, caused three to download.
> 
> So the Project backoff does work both ways.
> 
> 
> At 10:43 AM 7/23/2009 -0700, Lynn W. Taylor wrote:
>> I think I heard that "beta" was turned off temporarily, for bandwidth 
>> reasons -- I suspect Eric's mental "bandwidth" as well as the wire and 
>> computing resources.
>>
>> Seems like a good time for one less thing to juggle.
>>
>> Al Reust wrote:
>>> The next part of the story...
>>> This morning as I look I have two Seti Beta downloads in Retry x:xx:xxx 
>>> and three download pending. Uploads are working fine for for both projects.
>>> Quick excerpt from the log.
>>> 7/23/2009 9:35:56 AM    s...@home Beta Test     Reporting 8 completed 
>>> tasks, requesting new tasks for GPU
>>> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2742 bytes
>>> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2872 bytes
>>> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
>>> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
>>> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
>>> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2412 bytes
>>> 7/23/2009 9:36:01 AM    s...@home Beta Test     Scheduler request 
>>> completed: got 1 new tasks
>>> 7/23/2009 9:48:04 AM    s...@home Beta Test     [error] File 
>>> 09mr09aa.11273.7434.3.13.192 has wrong size: expected 375335, got 0
>>> 7/23/2009 9:48:04 AM    s...@home Beta Test     Started download of 
>>> 09mr09aa.11273.7434.3.13.192
>>> 7/23/2009 9:48:04 AM    s...@home Beta Test     [file_xfer_debug] URL: 
>>> http://boinc2.ssl.berkeley.edu/beta/download/32c/09mr09aa.11273.7434.3.13.192
>>> 7/23/2009 9:48:04 AM    s...@home Beta Test     [error] File 
>>> 09mr09aa.11273.157778.3.13.66 has wrong size: expected 375335, got 0
>>> 7/23/2009 9:48:04 AM    s...@home Beta Test     Started download of 
>>> 09mr09aa.11273.157778.3.13.66
>>> 7/23/2009 9:48:04 AM    s...@home Beta Test     [file_xfer_debug] URL: 
>>> http://boinc2.ssl.berkeley.edu/beta/download/32a/09mr09aa.11273.157778.3.13.66
>>> 7/23/2009 9:48:04 AM            [http_xfer_debug] HTTP: wrote 337 bytes
>>> 7/23/2009 9:48:04 AM            [http_xfer_debug] HTTP: wrote 336 bytes
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval -184
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval -184
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] file 
>>> transfer status -184
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     Temporarily failed 
>>> download of 09mr09aa.11273.7434.3.13.192: HTTP error
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>> project-wide xfer delay for 667.000336 sec
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     Backing off 2 hr 57 min 
>>> 43 sec on download of 09mr09aa.11273.7434.3.13.192
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] file 
>>> transfer status -184
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     Temporarily failed 
>>> download of 09mr09aa.11273.157778.3.13.66: HTTP error
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
>>> project-wide xfer delay for 3824.977569 sec
>>> 7/23/2009 9:48:05 AM    s...@home Beta Test     Backing off 3 hr 19 min 
>>> 25 sec on download of 09mr09aa.11273.157778.3.13.66
>>> 7/23/2009 9:52:23 AM    s...@home Beta Test     Computation for task 
>>> 09mr09aa.11273.157778.3.13.89_1 finished
>>> 7/23/2009 9:52:23 AM    s...@home Beta Test     Starting 
>>> 09mr09aa.11273.157778.3.13.77_0
>>> 7/23/2009 9:52:23 AM    s...@home Beta Test     Starting task 
>>> 09mr09aa.11273.157778.3.13.77_0 using setiathome_enhanced version 608
>>> 7/23/2009 9:52:25 AM    s...@home Beta Test     Started upload of 
>>> 09mr09aa.11273.157778.3.13.89_1_0
>>> 7/23/2009 9:52:25 AM    s...@home Beta Test     [file_xfer_debug] URL: 
>>> http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler
>>> 7/23/2009 9:52:26 AM            [http_xfer_debug] HTTP: wrote 93 bytes
>>> 7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval 0
>>> 7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>> upload response: <data_server_reply>    <status>0</status> 
>>> <file_size>0</file_size></data_server_reply>
>>> 7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>> status: 0
>>> 7/23/2009 9:52:28 AM            [http_xfer_debug] HTTP: wrote 64 bytes
>>> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval 0
>>> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>> upload response: <data_server_reply>    
>>> <status>0</status></data_server_reply>
>>> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] parsing 
>>> status: 0
>>> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] file 
>>> transfer status 0
>>> 7/23/2009 9:52:29 AM    s...@home Beta Test     Finished upload of 
>>> 09mr09aa.11273.157778.3.13.89_1_0
>>> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] 
>>> Throughput 20444 bytes/sec
>>> 7/23/2009 10:09:21 AM   nque...@home Project    Computation for task 
>>> Nq26_06_20_23_15_09_0 finished
>>> 7/23/2009 10:09:23 AM   nque...@home Project    Started upload of 
>>> Nq26_06_20_23_15_09_0_0
>>> 7/23/2009 10:09:23 AM   nque...@home Project    [file_xfer_debug] URL: 
>>> http://nqueens.ing.udec.cl/NQueens_cgi/file_upload_handler
>>> 7/23/2009 10:09:25 AM           [http_xfer_debug] HTTP: wrote 64 bytes
>>> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval 0
>>> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] parsing 
>>> upload response: <data_server_reply>    
>>> <status>0</status></data_server_reply>
>>> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] parsing 
>>> status: 0
>>> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] file 
>>> transfer status 0
>>> 7/23/2009 10:09:25 AM   nque...@home Project    Finished upload of 
>>> Nq26_06_20_23_15_09_0_0
>>> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] 
>>> Throughput 97 bytes/sec
>>> 7/23/2009 10:15:54 AM   s...@home Beta Test     Computation for task 
>>> 09mr09aa.11273.157778.3.13.77_0 finished
>>> 7/23/2009 10:15:54 AM   s...@home Beta Test     Starting 
>>> 09mr09aa.11273.157778.3.13.87_0
>>> 7/23/2009 10:15:54 AM   s...@home Beta Test     Starting task 
>>> 09mr09aa.11273.157778.3.13.87_0 using setiathome_enhanced version 608
>>> 7/23/2009 10:15:56 AM   s...@home Beta Test     Started upload of 
>>> 09mr09aa.11273.157778.3.13.77_0_0
>>> 7/23/2009 10:15:56 AM   s...@home Beta Test     [file_xfer_debug] URL: 
>>> http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler
>>> 7/23/2009 10:15:57 AM           [http_xfer_debug] HTTP: wrote 93 bytes
>>> 7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval 0
>>> 7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>> upload response: <data_server_reply>    <status>0</status> 
>>> <file_size>0</file_size></data_server_reply>
>>> 7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>> status: 0
>>> 7/23/2009 10:15:58 AM           [http_xfer_debug] HTTP: wrote 64 bytes
>>> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] 
>>> FILE_XFER_SET::poll(): http op done; retval 0
>>> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>> upload response: <data_server_reply>    
>>> <status>0</status></data_server_reply>
>>> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] parsing 
>>> status: 0
>>> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] file 
>>> transfer status 0
>>> 7/23/2009 10:15:58 AM   s...@home Beta Test     Finished upload of 
>>> 09mr09aa.11273.157778.3.13.77_0_0
>>> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] 
>>> Throughput 78530 bytes/sec
>>>
>>>
>>> At 03:44 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>>>> Sounds reasonable to me.  Not sure if that is what was intended.
>>>>
>>>> Al Reust wrote:
>>>>> Seti Beta - NQueens (running backup project) 6.6.38 Aries
>>>>> http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789
>>>>> 47 Cuda stuck in "project backoff"
>>>>> [s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of 
>>>>> 09mr09aa.11273.1299.3.13.117_1_0
>>>>> I could select one result and click Retry Now, it would pop 2 into 
>>>>> Uploading that errored out and extended the retry wait.
>>>>> I clicked Retry Now again on the one below that, 2 popped into 
>>>>> uploading and one got through which started the next one in the Queue.
>>>>> I clicked Retry Now again on the one below that, 2 popped into 
>>>>> uploading both timed out which extended the timer.
>>>>> Okay what next???
>>>>> Do Network communications
>>>>> The ones that had not been extended immediately went into Upload 
>>>>> Pending. It so happens it was coincident with Eric opening the pipe. 
>>>>> The first two got through okay, one of the next pair failed and went 
>>>>> into extended retry. The Next started uploading and got through.
>>>>> It continued until about half got through and what was left were those 
>>>>> waiting for the extended retry.
>>>>> Okay pick the top one and click Retry Now. One got through and the 
>>>>> other went into extended retry.
>>>>> Then the next set of 2 both got through.
>>>>> Do Network Communications
>>>>> I presume that as the last 2 both were met with success, the uploads 
>>>>> all went to Upload Pending and started cycling through the remaining 
>>>>> results.
>>>>> Of the 27 remaining 2/3rds were successful and the remaining went into 
>>>>> a extended retry.
>>>>> As Long as there was One Success after a failure it would proceed to 
>>>>> the next. IF there was no success (2 failures) it stopped. So a Host 
>>>>> with a very large number could be stuck until they get a clear 
>>>>> connection. Then they would still have those that had failed and 
>>>>> extended the retry time.
>>>>>
>>>>> At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>>>>>> If there is a bug, it may be in how fast the project-wide delay grows.
>>>>>>
>>>>>> I've only had a couple of cycles and it's already 2.7 hours.
>>>>>>
>>>>>> If there are unwarranted bug reports, it is because uploads sit at
>>>>>> "upload pending" and there is no indication in the GUI that there is a
>>>>>> project wide delay, or when it will finally be lifted.
>>>>>>
>>>>>> -- Lynn
>>>>>>
>>>>>> David Anderson wrote:
>>>>>>> I just checked in the change you describe.
>>>>>>> Actually it was added 4 years ago,
>>>>>>> but was backed out because there were reports that it wasn't working
>>>>>> right.
>>>>>>> This time I added some <file_xfer_debug>-enabled messages,
>>>>>>> so we should be able to track down any problems.
>>>>>>>
>>>>>>> -- David
>>>>>>>
>>>>>>> Lynn W. Taylor wrote:
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I've been watching s...@home during their current challenges, and I
>>>>>>>> think I see something that can be optimized a bit.
>>>>>>>>
>>>>>>>> The problem:
>>>>>>>>
>>>>>>>> Take a very fast machine, something with lots of RAM, a couple of i7
>>>>>>>> processors and several high-end CUDA cards -- a machine that can chew
>>>>>>>> through work units at an amazing rate.
>>>>>>>>
>>>>>>>> It has a big cache.
>>>>>>>>
>>>>>>>> As work is completed, each work unit goes into the transfer queue.
>>>>>>>>
>>>>>>>> BOINC sends each one, and if the upload server is unreachable, each 
>>>>>>>> work
>>>>>>>> unit is retried based on the back-off algorithm.
>>>>>>>>
>>>>>>>> If an upload fails, that information does not affect the other running
>>>>>>>> upload timers.
>>>>>>>>
>>>>>>>> In other words, this mega-fast machine could have a lot (hundreds) of
>>>>>>>> pending uploads, and tries every one every few hours.
>>>>>>>>
>>>>>>>> I see two issues:
>>>>>>>>
>>>>>>>> 1) The most important work (the one with the earliest deadline) may be
>>>>>>>> one of the ones that tries the least (longest interval).
>>>>>>>>
>>>>>>>> 2) Retrying 100's of units adds load to the servers.  180,000-odd
>>>>>>>> clients trying to reach one or two machines at SETI.
>>>>>>>>
>>>>>>>> Optimization:
>>>>>>>>
>>>>>>>> On a failed upload, BOINC could basically treat that as if every upload
>>>>>>>> timed out.  That would reduce the number of attempted uploads from all
>>>>>>>> clients, reducing the load on the servers.
>>>>>>>>
>>>>>>>> Of course, since the odds of a successful upload is just about zero for
>>>>>>>> a work unit that isn't retried, by itself this is a bad idea.
>>>>>>>>
>>>>>>>> So, when any retry timer runs out, instead of retrying that WU, retry
>>>>>>>> the one with the earliest deadline -- the one at the highest risk.
>>>>>>>>
>>>>>>>> As the load drops, work would continue to be uploaded in deadline order
>>>>>>>> until everything is caught up.
>>>>>>>>
>>>>>>>> I know a project can have different upload servers for different
>>>>>>>> applications, or for load balancing, or whatever, so this would only
>>>>>>>> apply to work going to the same server.
>>>>>>>>
>>>>>>>> The same idea could apply to downloads as well.  Does the BOINC client
>>>>>>>> get the deadline from the scheduler??
>>>>>>>>
>>>>>>>> Now, if I can figure out how to get a BOINC development environment
>>>>>>>> going, and unless it's just a stupid idea, I'll be glad to take a shot
>>>>>>>> at the code.
>>>>>>>>
>>>>>>>> Comments?
>>>>>>>>
>>>>>>>> -- Lynn
>>>>>>>> _______________________________________________
>>>>>>>> boinc_dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>>>> To unsubscribe, visit the above URL and
>>>>>>>> (near bottom of page) enter your email address.
>>>>>>> _______________________________________________
>>>>>>> boinc_dev mailing list
>>>>>>> [email protected]
>>>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>>> To unsubscribe, visit the above URL and
>>>>>>> (near bottom of page) enter your email address.
>>>>>> _______________________________________________
>>>>>> boinc_dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>> To unsubscribe, visit the above URL and
>>>>>> (near bottom of page) enter your email address.
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
> 
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to