The SETI@home servers are undertaking one of their periodic explorations of BOINC's boundary conditions, and appear to have discovered a previously-unknown positive feedback zone. Taking my host 5828732 as an example: http://setiathome.berkeley.edu/results.php?hostid=5828732 At the time of writing, that link shows 228 tasks in progress: but the computer beside me shows no SETI tasks at all. Every one of the 228 has been lost in transmission, with all recent RPCs (except 'report only') having ended in a timeout. Whether this is due to network congestion or slow server assembly of the reply message, I'll leave to the forensic analysts to discover. I'm more worried about the positive feedback loop - or vicious circle, as it is otherwise known. Looking at the list of 228 tasks notionally 'in progress', the final 20 are timestamped - out of sequence - 4 Nov 2012 | 8:30:48 UTC. That's what I would expect to see after a 'resent lost results' event, and I would expect that datestamp to increment every time the host attempts a work fetch, with the resending of lost tasks taking precedence over the allocation of new work. But since 08:30, the host has been allocated 08:59:01 UTC - 40 tasks 09:36:57 UTC - 19 tasks 09:44:17 UTC - 44 tasks 10:10:39 UTC - 48 tasks The vast majority of these tasks appear to have been created by the workunit generator just seconds before being allocated to my host. SETI's workunit generators ('splitters') are normally inhibited at a high water mark of around 300,000 'Results ready to send'. But with extra results being allocated to hosts, we are way below inhibition levels. Work continues to be generated at ~30 tasks per second. With the results being allocated to hosts, the nominal 'Results out in the field' has grown above 10,500,000 - 50% higher than any normal 'steady state' level. Yet volunteers report that their hosts, like mine, are receiving few or zero new task allocations. Unless some way can be found to inhibit work generation when task allocation messages fail to reach their intended recipients - which the 'lost task mechanism seems to be failing to do, just at the moment - the database is going to grow unboundedly, server RPC response times will increase (causing even more host requests to timeout), and the whole system will eventually fall over. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
