Richard,

You're agreeing with me.

There is a widespread perception that a lost upload causes "irreperable 
harm" to the project, but as you point out, the work unit will just be 
reassigned.

It's a waste of resources, and where possible waste *should* be avoided, 
but it the error is handled safely and the work gets done.

It is more of a perceived problem than an actual problem, but that 
doesn't mean it should not be addressed in some way.

It is my personal opinion that buttons and checkboxes are wonderful 
things for probably 5% of the community.  For the other 95%, BOINC needs 
to handle this for them without a checkbox or button.

As for the fact that the full upload has to complete before the error 
comes back, that's as much an artifact of HTTP as anything else, and 
can't be fixed without using something other than HTTP.

-- Lynn

Richard Haselgrove wrote:
> Lynn W. Taylor wrote
> 
> 
>> It seems to me that the big fear is the two-week timer: if a work unit
>> can't be uploaded in two weeks, it's going to be thrown away, causing
>> "irreparable harm to the project" and a tragic hit to the cruncher's RAC.
> 
> CPDN CM3-160 models can run for 4 months or longer. Losing the final upload 
> data does indeed cause harm to the project - not irreparable, since a resend 
> can replace it, another four months later - but that seems like a waste. 
> Note that there is NO tragic hit to the cruncher's RAC: credit is awarded by 
> trickle (RPC to scheduler - no upload needed), so the ONLY damage done is to 
> the project science.
> 
> There are other costs. Until recently, many uploads sent the entire file 
> before being told there was an error and backing off for a resend. I believe 
> these cases have now been fixed (has there been a thorough audit of all 
> circumstances where this could have happened?), so - provided all projects 
> have updated their server software - this problem should not recur. But if 
> not, giving users control of uploads covers those situations where upload 
> bandwidth incurs a real financial cost (mobile users, rural locations with 
> satellite upload, etc.). Remember that individual file uploads can be up to 
> 25MB in size.
> 
> In the early days of CPDN, the advice given when upload problems occur was 
> to 'suspend networking'. This is an incredibly blunt instrument: it suspends 
> both scheduler RPCs and file transfers in both directions, for every 
> project. And EVERY network operation - RPC, download and upload, for every 
> project - is allowed for 5 minutes after any one of them is retried. This 
> may have been appropriate in the days of single-core CPUs attached to a 
> small number of BOINC projects, but it needs re-thinking in the era of 
> multi-core CPUs, multiple GPUs, and non-CPU-intensive tasks. You can't 
> suspend networking if you're running QuakeCatcher. It makes far more sense 
> to selectively ex-communicate projects, like Einstein, which are known to be 
> having fileserver problems just at the moment.
> 
>> David Anderson wrote:
>>> I don't see why this is needed.
>>> If communication (RPC or file transfer) with a project is failing,
>>> the client's backoff mechanisms should kick in and it should
>>> stop trying to connect to that project.
>>> If these mechanisms aren't working right,
>>> let's fix them instead of adding a workaround.
>>>
>>> If a manual control is needed, I think it should be a checkbox
>>> on the project properties page rather than a new button.
>>> There are too many buttons already.
>>>
>>> -- David
>>>
>>> -------- Original Message --------
>>> Subject: Re: [BOINC] #139: Project-by-project network disable (similar to
>>> communications deferred)
>>> Date: Thu, 13 Aug 2009 22:50:41 -0000
>>> From: BOINC <[email protected]>
>>> Reply-To: [email protected]
>>> References: <[email protected]>
>>>
>>> #139: Project-by-project network disable (similar to communications 
>>> deferred)
>>> --------------------------+-------------------------------------------------
>>>    Reporter:  MikeMarsUK   |       Owner:  davea
>>>        Type:  Enhancement  |      Status:  reopened
>>>    Priority:  Major        |   Milestone:  Undetermined
>>>   Component:  Manager      |     Version:
>>> Resolution:               |    Keywords:
>>> --------------------------+-------------------------------------------------
>>> Comment (by Thyme Lawn):
>>>
>>>   I have implemented the requested functionality, tested by a number of
>>>   users over the past 2 months.
>>>
>>>   The changes allow networking to be suspended and resumed for selected
>>>   projects, adding a new "Suspend network"/"Resume network" button to 
>>> BOINC
>>>   Manager's Projects tab.
>>>
>>>   When project networking is suspended any in progress uploads will have
>>>   their timers reset and upload will not be restarted until project
>>>   networking is resumed.  No scheduler requests will be made but any 
>>> pending
>>>   downloads for the project will be completed.  The project's status will 
>>> be
>>>   displayed as "Network activity suspended by user".
>>>
>>>   If a network suspended project generates a trickle-up this will be 
>>> shown
>>>   in the project's status message as "Network activity suspended by user,
>>>   Trickle upload pending".
>>>
>>>   A scheduler request can be forced at any time by clicking the Update
>>>   button.  That will send any pending trickle-up messages and (if 
>>> required)
>>>   request new work for the project.  If new tasks are allocated any 
>>> required
>>>   downloads will be made automatically without the need to enable project
>>>   networking.
>>>
>>>   The status message on the Tasks tab for completed tasks which haven't 
>>> been
>>>   uploaded will be "Uploading, project networking suspended".
>>>
>>>   The status message on the Transfers tab for blocked uploads will be
>>>   "Upload pending, project networking suspended".
>>>
>>>   When project networking is resumed any blocked uploads will be started.
>>>
>>>   I have patches (at revision 18840) available for 
>>> boinc_core_release_6_6a,
>>>   boinc_core_release_6_8 and boinc_trunk but the attachment option seems 
>>> to
>>>   be disabled at the moment.
>>>
>> _______________________________________________
>> boinc_dev mailing list
>> [email protected]
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
>>
> 
> 
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to