Stopping reissues of work that is expiring during the outage is a way to
keep the indians happy as that means that they are not racing the reissue.
It also tends to conserve processing power. It has very little effect at
the beginning of the current outage, and may have no effect on the current
problem at all if the tasks take more than a day to process.
jm7
"Lynn W. Taylor"
<[email protected]>
To
07/14/2009 12:39 [email protected]
PM cc
BOINC dev
<[email protected]>,
[email protected]
Subject
Re: [boinc_dev] Optimizing
uploads.....
Sure, but how long would it take?
If you do nothing, the problem will ease by itself, but while you're
waiting the natives get restless.
[email protected] wrote:
> It doesn't help immediately, but it will help reduce the length of the
> problem.
>
> jm7
>
> [email protected] wrote on 07/14/2009 12:24:19 PM:
>
>> "Lynn W. Taylor" <[email protected]>
>> Sent by: [email protected]
>>
>> 07/14/2009 12:24 PM
>>
>> To
>>
>> [email protected], BOINC dev <[email protected]>
>>
>> cc
>>
>> Subject
>>
>> Re: [boinc_dev] Optimizing uploads.....
>>
>> Slowing work assignment is going to help, but if we're talking about too
>> many simultaneous downloads, it's already too late: the downloads are
>> already assigned.
>>
>> If we're talking about too many uploads, those were assigned possibly
>> days ago -- slowing work assignment today will take a while to show a
>> result.
>>
>> At the same time, just battering away at the servers will also show some
>> success, and as uploads or downloads complete, by itself that'll lower
>> the load.
>>
>> To be effective, you want something that can produce a result fast. We
>> may not be able to get that, but should be able to get a result in an
> hour.
>> -- Lynn
>>
>> [email protected] wrote:
>>> Unfortunately, the upload server cannot know anything about the host
> that
>>> is sending the file up as that would require database access per
> upload.
>>> The upload server could possibly inform the rest of the server system
> about
>>> failed uploads due to overloads. This could be used to limit task
> creation
>>> and delay deadline enforcement.
>>>
>>> jm7
>>>
>>>
>>>
>
>>> "Josef W. Segur"
>
>>> <jse...@westelcom
>
>>> .com>
> To
>>> Sent by: [email protected]
>
>>> boinc_dev-bounces
> cc
>>> @ssl.berkeley.edu "Lynn W. Taylor"
> <[email protected]>
> Subject
>>> Re: [boinc_dev] Optimizing
>
>>> 07/14/2009 10:28 uploads.....
>
>>> AM
>
>
>
>
>
>
>>>
>>>
>>>
>>> You're correct that skipped opportunities to try uploads do not count
> as
>>> "failed", they simply don't count at all. Typical hosts with continuous
>>> connection try again every two hours on average because the actual
> backoffs
>>> are randomized, so get 12 retries a day until the upload problem is
>>> cleared. Hosts with only one hour a day connectivity will retry at the
>>> beginning of that one hour and have a 25% chance of another retry in
> that
>>> one hour.
>>>
>>> With failed uploads disabling downloads, it seems to me that hosts with
>>> limited connect times due to IT department restrictions ought to be
>>> given the best possible chance of getting new tasks when they are
> allowed
>>> to connect.
>>> --
>>> Joe
>>>
>>>
>>> On 13 Jul 2009 at 21:38, Lynn wrote:
>>>
>>>> I'm actually limiting connectivity to 1 hour out of 24.
>>>>
>>>> With the current backoff algorithm, the timer runs out at four hours,
>>>> the state goes to "suspended" and the moment the appointed hour rolls
>>>> around the uploads start, two by two.
>>>>
>>>> If those first two succeed, then the rest would continue, with two
>>>> connections at all times -- no matter how many you had.
>>>>
>>>> I don't think attempts while suspended would count as "failed."
>>>>
>>>> -- Lynn
>>>>
>>>> Josef W. Segur wrote:
>>>>
>>>>> My suggestion is to reduce the retry count for each retry attempt
>>> skipped
>>>>> because a host has network activity turned off. A rough approximation
>>> of
>>>>> that could be done by saving the time a host disables network
> activity,
>>>>> get the interval when it is reenabled, divide that by half the
> maximum
>>>>> backoff and subtract the integer portion of the result from the retry
>>>>> count (with a floor of 0 retries, of course). A host with network
>>> activity
>>>>> disabled for 23 hours out of each 24 would automatically be back at
>>> minimal
>>>>> retry count and therefore at minimal backoff at the beginning of each
>>>>> active period, but a short period of manually disabling network
>>> activity
>>>>> would have no effect.
>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>>>
>>>
>>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>>
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.