I think I heard that "beta" was turned off temporarily, for bandwidth 
reasons -- I suspect Eric's mental "bandwidth" as well as the wire and 
computing resources.

Seems like a good time for one less thing to juggle.

Al Reust wrote:
> The next part of the story...
> 
> This morning as I look I have two Seti Beta downloads in Retry x:xx:xxx and 
> three download pending. Uploads are working fine for for both projects.
> 
> Quick excerpt from the log.
> 
> 7/23/2009 9:35:56 AM    s...@home Beta Test     Reporting 8 completed 
> tasks, requesting new tasks for GPU
> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2742 bytes
> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2872 bytes
> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 1436 bytes
> 7/23/2009 9:35:58 AM            [http_xfer_debug] HTTP: wrote 2412 bytes
> 7/23/2009 9:36:01 AM    s...@home Beta Test     Scheduler request 
> completed: got 1 new tasks
> 7/23/2009 9:48:04 AM    s...@home Beta Test     [error] File 
> 09mr09aa.11273.7434.3.13.192 has wrong size: expected 375335, got 0
> 7/23/2009 9:48:04 AM    s...@home Beta Test     Started download of 
> 09mr09aa.11273.7434.3.13.192
> 7/23/2009 9:48:04 AM    s...@home Beta Test     [file_xfer_debug] URL: 
> http://boinc2.ssl.berkeley.edu/beta/download/32c/09mr09aa.11273.7434.3.13.192
> 7/23/2009 9:48:04 AM    s...@home Beta Test     [error] File 
> 09mr09aa.11273.157778.3.13.66 has wrong size: expected 375335, got 0
> 7/23/2009 9:48:04 AM    s...@home Beta Test     Started download of 
> 09mr09aa.11273.157778.3.13.66
> 7/23/2009 9:48:04 AM    s...@home Beta Test     [file_xfer_debug] URL: 
> http://boinc2.ssl.berkeley.edu/beta/download/32a/09mr09aa.11273.157778.3.13.66
> 7/23/2009 9:48:04 AM            [http_xfer_debug] HTTP: wrote 337 bytes
> 7/23/2009 9:48:04 AM            [http_xfer_debug] HTTP: wrote 336 bytes
> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval -184
> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval -184
> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] file 
> transfer status -184
> 7/23/2009 9:48:05 AM    s...@home Beta Test     Temporarily failed download 
> of 09mr09aa.11273.7434.3.13.192: HTTP error
> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
> project-wide xfer delay for 667.000336 sec
> 7/23/2009 9:48:05 AM    s...@home Beta Test     Backing off 2 hr 57 min 43 
> sec on download of 09mr09aa.11273.7434.3.13.192
> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] file 
> transfer status -184
> 7/23/2009 9:48:05 AM    s...@home Beta Test     Temporarily failed download 
> of 09mr09aa.11273.157778.3.13.66: HTTP error
> 7/23/2009 9:48:05 AM    s...@home Beta Test     [file_xfer_debug] 
> project-wide xfer delay for 3824.977569 sec
> 7/23/2009 9:48:05 AM    s...@home Beta Test     Backing off 3 hr 19 min 25 
> sec on download of 09mr09aa.11273.157778.3.13.66
> 7/23/2009 9:52:23 AM    s...@home Beta Test     Computation for task 
> 09mr09aa.11273.157778.3.13.89_1 finished
> 7/23/2009 9:52:23 AM    s...@home Beta Test     Starting 
> 09mr09aa.11273.157778.3.13.77_0
> 7/23/2009 9:52:23 AM    s...@home Beta Test     Starting task 
> 09mr09aa.11273.157778.3.13.77_0 using setiathome_enhanced version 608
> 7/23/2009 9:52:25 AM    s...@home Beta Test     Started upload of 
> 09mr09aa.11273.157778.3.13.89_1_0
> 7/23/2009 9:52:25 AM    s...@home Beta Test     [file_xfer_debug] URL: 
> http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler
> 7/23/2009 9:52:26 AM            [http_xfer_debug] HTTP: wrote 93 bytes
> 7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval 0
> 7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] parsing 
> upload response: 
> <data_server_reply>    <status>0</status> 
> <file_size>0</file_size></data_server_reply>
> 7/23/2009 9:52:27 AM    s...@home Beta Test     [file_xfer_debug] parsing 
> status: 0
> 7/23/2009 9:52:28 AM            [http_xfer_debug] HTTP: wrote 64 bytes
> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval 0
> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] parsing 
> upload response: <data_server_reply>    <status>0</status></data_server_reply>
> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] parsing 
> status: 0
> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] file 
> transfer status 0
> 7/23/2009 9:52:29 AM    s...@home Beta Test     Finished upload of 
> 09mr09aa.11273.157778.3.13.89_1_0
> 7/23/2009 9:52:29 AM    s...@home Beta Test     [file_xfer_debug] 
> Throughput 20444 bytes/sec
> 7/23/2009 10:09:21 AM   nque...@home Project    Computation for task 
> Nq26_06_20_23_15_09_0 finished
> 7/23/2009 10:09:23 AM   nque...@home Project    Started upload of 
> Nq26_06_20_23_15_09_0_0
> 7/23/2009 10:09:23 AM   nque...@home Project    [file_xfer_debug] URL: 
> http://nqueens.ing.udec.cl/NQueens_cgi/file_upload_handler
> 7/23/2009 10:09:25 AM           [http_xfer_debug] HTTP: wrote 64 bytes
> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval 0
> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] parsing 
> upload response: <data_server_reply>    <status>0</status></data_server_reply>
> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] parsing 
> status: 0
> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] file 
> transfer status 0
> 7/23/2009 10:09:25 AM   nque...@home Project    Finished upload of 
> Nq26_06_20_23_15_09_0_0
> 7/23/2009 10:09:25 AM   nque...@home Project    [file_xfer_debug] 
> Throughput 97 bytes/sec
> 7/23/2009 10:15:54 AM   s...@home Beta Test     Computation for task 
> 09mr09aa.11273.157778.3.13.77_0 finished
> 7/23/2009 10:15:54 AM   s...@home Beta Test     Starting 
> 09mr09aa.11273.157778.3.13.87_0
> 7/23/2009 10:15:54 AM   s...@home Beta Test     Starting task 
> 09mr09aa.11273.157778.3.13.87_0 using setiathome_enhanced version 608
> 7/23/2009 10:15:56 AM   s...@home Beta Test     Started upload of 
> 09mr09aa.11273.157778.3.13.77_0_0
> 7/23/2009 10:15:56 AM   s...@home Beta Test     [file_xfer_debug] URL: 
> http://setiboincdata.ssl.berkeley.edu/beta_cgi/file_upload_handler
> 7/23/2009 10:15:57 AM           [http_xfer_debug] HTTP: wrote 93 bytes
> 7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval 0
> 7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] parsing 
> upload response: 
> <data_server_reply>    <status>0</status> 
> <file_size>0</file_size></data_server_reply>
> 7/23/2009 10:15:57 AM   s...@home Beta Test     [file_xfer_debug] parsing 
> status: 0
> 7/23/2009 10:15:58 AM           [http_xfer_debug] HTTP: wrote 64 bytes
> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] 
> FILE_XFER_SET::poll(): http op done; retval 0
> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] parsing 
> upload response: <data_server_reply>    <status>0</status></data_server_reply>
> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] parsing 
> status: 0
> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] file 
> transfer status 0
> 7/23/2009 10:15:58 AM   s...@home Beta Test     Finished upload of 
> 09mr09aa.11273.157778.3.13.77_0_0
> 7/23/2009 10:15:58 AM   s...@home Beta Test     [file_xfer_debug] 
> Throughput 78530 bytes/sec
> 
> 
> 
> 
> At 03:44 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>> Sounds reasonable to me.  Not sure if that is what was intended.
>>
>> Al Reust wrote:
>>> Seti Beta - NQueens (running backup project) 6.6.38 Aries
>>> http://setiathome.berkeley.edu/beta/show_host_detail.php?hostid=22789
>>> 47 Cuda stuck in "project backoff"
>>> [s...@home Beta Test] Backing off 2 hr 6 min 34 sec on upload of 
>>> 09mr09aa.11273.1299.3.13.117_1_0
>>> I could select one result and click Retry Now, it would pop 2 into 
>>> Uploading that errored out and extended the retry wait.
>>> I clicked Retry Now again on the one below that, 2 popped into uploading 
>>> and one got through which started the next one in the Queue.
>>> I clicked Retry Now again on the one below that, 2 popped into uploading 
>>> both timed out which extended the timer.
>>> Okay what next???
>>> Do Network communications
>>> The ones that had not been extended immediately went into Upload Pending. 
>>> It so happens it was coincident with Eric opening the pipe. The first two 
>>> got through okay, one of the next pair failed and went into extended 
>>> retry. The Next started uploading and got through.
>>> It continued until about half got through and what was left were those 
>>> waiting for the extended retry.
>>> Okay pick the top one and click Retry Now. One got through and the other 
>>> went into extended retry.
>>> Then the next set of 2 both got through.
>>> Do Network Communications
>>> I presume that as the last 2 both were met with success, the uploads all 
>>> went to Upload Pending and started cycling through the remaining results.
>>> Of the 27 remaining 2/3rds were successful and the remaining went into a 
>>> extended retry.
>>> As Long as there was One Success after a failure it would proceed to the 
>>> next. IF there was no success (2 failures) it stopped. So a Host with a 
>>> very large number could be stuck until they get a clear connection. Then 
>>> they would still have those that had failed and extended the retry time.
>>>
>>> At 02:29 PM 7/22/2009 -0700, Lynn W. Taylor wrote:
>>>> If there is a bug, it may be in how fast the project-wide delay grows.
>>>>
>>>> I've only had a couple of cycles and it's already 2.7 hours.
>>>>
>>>> If there are unwarranted bug reports, it is because uploads sit at
>>>> "upload pending" and there is no indication in the GUI that there is a
>>>> project wide delay, or when it will finally be lifted.
>>>>
>>>> -- Lynn
>>>>
>>>> David Anderson wrote:
>>>>> I just checked in the change you describe.
>>>>> Actually it was added 4 years ago,
>>>>> but was backed out because there were reports that it wasn't working 
>>>> right.
>>>>> This time I added some <file_xfer_debug>-enabled messages,
>>>>> so we should be able to track down any problems.
>>>>>
>>>>> -- David
>>>>>
>>>>> Lynn W. Taylor wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> I've been watching s...@home during their current challenges, and I
>>>>>> think I see something that can be optimized a bit.
>>>>>>
>>>>>> The problem:
>>>>>>
>>>>>> Take a very fast machine, something with lots of RAM, a couple of i7
>>>>>> processors and several high-end CUDA cards -- a machine that can chew
>>>>>> through work units at an amazing rate.
>>>>>>
>>>>>> It has a big cache.
>>>>>>
>>>>>> As work is completed, each work unit goes into the transfer queue.
>>>>>>
>>>>>> BOINC sends each one, and if the upload server is unreachable, each work
>>>>>> unit is retried based on the back-off algorithm.
>>>>>>
>>>>>> If an upload fails, that information does not affect the other running
>>>>>> upload timers.
>>>>>>
>>>>>> In other words, this mega-fast machine could have a lot (hundreds) of
>>>>>> pending uploads, and tries every one every few hours.
>>>>>>
>>>>>> I see two issues:
>>>>>>
>>>>>> 1) The most important work (the one with the earliest deadline) may be
>>>>>> one of the ones that tries the least (longest interval).
>>>>>>
>>>>>> 2) Retrying 100's of units adds load to the servers.  180,000-odd
>>>>>> clients trying to reach one or two machines at SETI.
>>>>>>
>>>>>> Optimization:
>>>>>>
>>>>>> On a failed upload, BOINC could basically treat that as if every upload
>>>>>> timed out.  That would reduce the number of attempted uploads from all
>>>>>> clients, reducing the load on the servers.
>>>>>>
>>>>>> Of course, since the odds of a successful upload is just about zero for
>>>>>> a work unit that isn't retried, by itself this is a bad idea.
>>>>>>
>>>>>> So, when any retry timer runs out, instead of retrying that WU, retry
>>>>>> the one with the earliest deadline -- the one at the highest risk.
>>>>>>
>>>>>> As the load drops, work would continue to be uploaded in deadline order
>>>>>> until everything is caught up.
>>>>>>
>>>>>> I know a project can have different upload servers for different
>>>>>> applications, or for load balancing, or whatever, so this would only
>>>>>> apply to work going to the same server.
>>>>>>
>>>>>> The same idea could apply to downloads as well.  Does the BOINC client
>>>>>> get the deadline from the scheduler??
>>>>>>
>>>>>> Now, if I can figure out how to get a BOINC development environment
>>>>>> going, and unless it's just a stupid idea, I'll be glad to take a shot
>>>>>> at the code.
>>>>>>
>>>>>> Comments?
>>>>>>
>>>>>> -- Lynn
>>>>>> _______________________________________________
>>>>>> boinc_dev mailing list
>>>>>> [email protected]
>>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>>> To unsubscribe, visit the above URL and
>>>>>> (near bottom of page) enter your email address.
>>>>> _______________________________________________
>>>>> boinc_dev mailing list
>>>>> [email protected]
>>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>>> To unsubscribe, visit the above URL and
>>>>> (near bottom of page) enter your email address.
>>>>>
>>>> _______________________________________________
>>>> boinc_dev mailing list
>>>> [email protected]
>>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>>> To unsubscribe, visit the above URL and
>>>> (near bottom of page) enter your email address.
> 
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to