Lynn W. Taylor wrote
> It seems to me that the big fear is the two-week timer: if a work unit > can't be uploaded in two weeks, it's going to be thrown away, causing > "irreparable harm to the project" and a tragic hit to the cruncher's RAC. CPDN CM3-160 models can run for 4 months or longer. Losing the final upload data does indeed cause harm to the project - not irreparable, since a resend can replace it, another four months later - but that seems like a waste. Note that there is NO tragic hit to the cruncher's RAC: credit is awarded by trickle (RPC to scheduler - no upload needed), so the ONLY damage done is to the project science. There are other costs. Until recently, many uploads sent the entire file before being told there was an error and backing off for a resend. I believe these cases have now been fixed (has there been a thorough audit of all circumstances where this could have happened?), so - provided all projects have updated their server software - this problem should not recur. But if not, giving users control of uploads covers those situations where upload bandwidth incurs a real financial cost (mobile users, rural locations with satellite upload, etc.). Remember that individual file uploads can be up to 25MB in size. In the early days of CPDN, the advice given when upload problems occur was to 'suspend networking'. This is an incredibly blunt instrument: it suspends both scheduler RPCs and file transfers in both directions, for every project. And EVERY network operation - RPC, download and upload, for every project - is allowed for 5 minutes after any one of them is retried. This may have been appropriate in the days of single-core CPUs attached to a small number of BOINC projects, but it needs re-thinking in the era of multi-core CPUs, multiple GPUs, and non-CPU-intensive tasks. You can't suspend networking if you're running QuakeCatcher. It makes far more sense to selectively ex-communicate projects, like Einstein, which are known to be having fileserver problems just at the moment. > David Anderson wrote: >> I don't see why this is needed. >> If communication (RPC or file transfer) with a project is failing, >> the client's backoff mechanisms should kick in and it should >> stop trying to connect to that project. >> If these mechanisms aren't working right, >> let's fix them instead of adding a workaround. >> >> If a manual control is needed, I think it should be a checkbox >> on the project properties page rather than a new button. >> There are too many buttons already. >> >> -- David >> >> -------- Original Message -------- >> Subject: Re: [BOINC] #139: Project-by-project network disable (similar to >> communications deferred) >> Date: Thu, 13 Aug 2009 22:50:41 -0000 >> From: BOINC <[email protected]> >> Reply-To: [email protected] >> References: <[email protected]> >> >> #139: Project-by-project network disable (similar to communications >> deferred) >> --------------------------+------------------------------------------------- >> Reporter: MikeMarsUK | Owner: davea >> Type: Enhancement | Status: reopened >> Priority: Major | Milestone: Undetermined >> Component: Manager | Version: >> Resolution: | Keywords: >> --------------------------+------------------------------------------------- >> Comment (by Thyme Lawn): >> >> I have implemented the requested functionality, tested by a number of >> users over the past 2 months. >> >> The changes allow networking to be suspended and resumed for selected >> projects, adding a new "Suspend network"/"Resume network" button to >> BOINC >> Manager's Projects tab. >> >> When project networking is suspended any in progress uploads will have >> their timers reset and upload will not be restarted until project >> networking is resumed. No scheduler requests will be made but any >> pending >> downloads for the project will be completed. The project's status will >> be >> displayed as "Network activity suspended by user". >> >> If a network suspended project generates a trickle-up this will be >> shown >> in the project's status message as "Network activity suspended by user, >> Trickle upload pending". >> >> A scheduler request can be forced at any time by clicking the Update >> button. That will send any pending trickle-up messages and (if >> required) >> request new work for the project. If new tasks are allocated any >> required >> downloads will be made automatically without the need to enable project >> networking. >> >> The status message on the Tasks tab for completed tasks which haven't >> been >> uploaded will be "Uploading, project networking suspended". >> >> The status message on the Transfers tab for blocked uploads will be >> "Upload pending, project networking suspended". >> >> When project networking is resumed any blocked uploads will be started. >> >> I have patches (at revision 18840) available for >> boinc_core_release_6_6a, >> boinc_core_release_6_8 and boinc_trunk but the attachment option seems >> to >> be disabled at the moment. >> > _______________________________________________ > boinc_dev mailing list > [email protected] > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
