The BOINC core client attempts to get work completed, uploaded, and reported
somewhat before the actual report deadline. If the project is down that
attempt will fail. For the 3 day outages at s...@h, the only way of making
the computation deadline early enough is to set at least 3 days for the
"Connect about every" preference, and that may not be practical if the
computer is also doing work for other projects.
A missed deadline means the inefficiency of sending another replication.
Thinking about ways to avoid that, it seems a method of extending result
deadlines until there's a chance for completed work to be reported may be
useful not only for the s...@h scheduled outages, but also for any outage
of significant length at most projects.
Here's one method which seems practical, implemented with a config boolean
"defer_res_report_deadline" which the project would set when not able to
accept reports, clear when the outage had finished and enough time had
passed for completed work to be reported. This is a pseudo-diff starting
at about line 225 of transitioner.cpp:
======================================================================
// Scan this WU's results, and
// 1) count those in various server states;
// 2) identify time-out results and update their server state and outcome
// 3) find the max result suffix (in case need to generate new ones)
// 4) see if we have a new result to validate
// (outcome SUCCESS and validate_state INIT)
//
+ x = now + 4*3600 + rand()%(4*3600); // 4 to 8 hour deferral if needed
for (i=0; i<items.size(); i++) {
TRANSITIONER_ITEM& res_item = items[i];
if (!res_item.res_id) continue;
ntotal++;
rs = result_suffix(res_item.res_name);
if (rs > max_result_suffix) max_result_suffix = rs;
switch (res_item.res_server_state) {
case RESULT_SERVER_STATE_UNSENT:
nunsent++;
break;
case RESULT_SERVER_STATE_IN_PROGRESS:
if (res_item.res_report_deadline < now) {
+ if (config.defer_res_report_deadline) {
+ res_item.res_report_deadline = x;
+ retval = transitioner.update_result(res_item);
+ ninprogress++;
+ } else {
log_messages.printf(MSG_NORMAL,
"[WU#%d %s] [RESULT#%d %s] result timed out (%d < %d)
server_state:IN_PROGRESS=>OVER; outcome:NO_REPLY\n",
wu_item.id, wu_item.name, res_item.res_id,
res_item.res_name,
res_item.res_report_deadline, (int)now
);
...
...
+ }
======================================================================
I've not attempted to add log messages and such, that's just a sketch. There
are many obvious variations possible, but this one seems most generally
useful to me.
--
Joe
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.