Lynn W. Taylor wrote

> It seems to me that the big fear is the two-week timer: if a work unit
> can't be uploaded in two weeks, it's going to be thrown away, causing
> "irreparable harm to the project" and a tragic hit to the cruncher's RAC.

CPDN CM3-160 models can run for 4 months or longer. Losing the final upload 
data does indeed cause harm to the project - not irreparable, since a resend 
can replace it, another four months later - but that seems like a waste. 
Note that there is NO tragic hit to the cruncher's RAC: credit is awarded by 
trickle (RPC to scheduler - no upload needed), so the ONLY damage done is to 
the project science.

There are other costs. Until recently, many uploads sent the entire file 
before being told there was an error and backing off for a resend. I believe 
these cases have now been fixed (has there been a thorough audit of all 
circumstances where this could have happened?), so - provided all projects 
have updated their server software - this problem should not recur. But if 
not, giving users control of uploads covers those situations where upload 
bandwidth incurs a real financial cost (mobile users, rural locations with 
satellite upload, etc.). Remember that individual file uploads can be up to 
25MB in size.

In the early days of CPDN, the advice given when upload problems occur was 
to 'suspend networking'. This is an incredibly blunt instrument: it suspends 
both scheduler RPCs and file transfers in both directions, for every 
project. And EVERY network operation - RPC, download and upload, for every 
project - is allowed for 5 minutes after any one of them is retried. This 
may have been appropriate in the days of single-core CPUs attached to a 
small number of BOINC projects, but it needs re-thinking in the era of 
multi-core CPUs, multiple GPUs, and non-CPU-intensive tasks. You can't 
suspend networking if you're running QuakeCatcher. It makes far more sense 
to selectively ex-communicate projects, like Einstein, which are known to be 
having fileserver problems just at the moment.

> David Anderson wrote:
>> I don't see why this is needed.
>> If communication (RPC or file transfer) with a project is failing,
>> the client's backoff mechanisms should kick in and it should
>> stop trying to connect to that project.
>> If these mechanisms aren't working right,
>> let's fix them instead of adding a workaround.
>>
>> If a manual control is needed, I think it should be a checkbox
>> on the project properties page rather than a new button.
>> There are too many buttons already.
>>
>> -- David
>>
>> -------- Original Message --------
>> Subject: Re: [BOINC] #139: Project-by-project network disable (similar to
>> communications deferred)
>> Date: Thu, 13 Aug 2009 22:50:41 -0000
>> From: BOINC <[email protected]>
>> Reply-To: [email protected]
>> References: <[email protected]>
>>
>> #139: Project-by-project network disable (similar to communications 
>> deferred)
>> --------------------------+-------------------------------------------------
>>    Reporter:  MikeMarsUK   |       Owner:  davea
>>        Type:  Enhancement  |      Status:  reopened
>>    Priority:  Major        |   Milestone:  Undetermined
>>   Component:  Manager      |     Version:
>> Resolution:               |    Keywords:
>> --------------------------+-------------------------------------------------
>> Comment (by Thyme Lawn):
>>
>>   I have implemented the requested functionality, tested by a number of
>>   users over the past 2 months.
>>
>>   The changes allow networking to be suspended and resumed for selected
>>   projects, adding a new "Suspend network"/"Resume network" button to 
>> BOINC
>>   Manager's Projects tab.
>>
>>   When project networking is suspended any in progress uploads will have
>>   their timers reset and upload will not be restarted until project
>>   networking is resumed.  No scheduler requests will be made but any 
>> pending
>>   downloads for the project will be completed.  The project's status will 
>> be
>>   displayed as "Network activity suspended by user".
>>
>>   If a network suspended project generates a trickle-up this will be 
>> shown
>>   in the project's status message as "Network activity suspended by user,
>>   Trickle upload pending".
>>
>>   A scheduler request can be forced at any time by clicking the Update
>>   button.  That will send any pending trickle-up messages and (if 
>> required)
>>   request new work for the project.  If new tasks are allocated any 
>> required
>>   downloads will be made automatically without the need to enable project
>>   networking.
>>
>>   The status message on the Tasks tab for completed tasks which haven't 
>> been
>>   uploaded will be "Uploading, project networking suspended".
>>
>>   The status message on the Transfers tab for blocked uploads will be
>>   "Upload pending, project networking suspended".
>>
>>   When project networking is resumed any blocked uploads will be started.
>>
>>   I have patches (at revision 18840) available for 
>> boinc_core_release_6_6a,
>>   boinc_core_release_6_8 and boinc_trunk but the attachment option seems 
>> to
>>   be disabled at the moment.
>>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
> 


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to