Just trying to do some forward thinking here in conceptual terms. Let's assume both processing speed and CPUs are doubling in time factor X.
If we start with an average machine from the past, which was a single processor and it hit a 'testable event' (defined anyway you want) every ten minutes (600 seconds). A current machine with dual processors would now hit an event on average in 2.5 minutes (180 seconds) (2x as fast processor = 5 min x 2 processors = 2.5 min) A future system with quad processors would hit an event on average every 37 seconds (4x as fast processor =2.5 min x 4 processors = .625 min) And on into the future ... (and that doesn't even bring in those who like to be on the bleeding edge of the latest & greatest ... who BTW would be the first to see what us average Joes will be seeing the near future) So if my concepts are right, there is an exponential shrinkage in time between the events, that at some point the law of diminishing returns begins to kicks in. Even if the check takes only a fraction of a second, at some point there will be machines where at least one of it's processors will be hitting it all the time. How best to deal with it?? I don't know, I'm no programer or systems designer, but fixing the problem with the checking routine will certainly help. But I think that'll only delay getting to the point, of having to skip checks, and 'assume' that nothing has changed in the last X minutes/ seconds/ nano-seconds (pick your own time-frame). John ----- Original Message ---- From: "[email protected]" <[email protected]> To: Paul D. Buck <[email protected]> Cc: TarotApprentice <[email protected]>; BOINC dev <[email protected]>; [email protected] Sent: Monday, April 27, 2009 12:33:51 PM Subject: Re: [boinc_dev] 6.6.20 and work scheduling There is a history of the messages kept in stdoutdae.txt. And you can increase or decrease the size of the history by using flags in cc_config.xml. You have yet to come up with a single good reason why the frequency of calls to the check is a problem. You keep complaining about the task switches that happen to frequently, and you keep stating that if the test were slowed down that would fix the problem. This is a complete non-sequitur as far as everyone else can determine. You may be able to see this as a solution, nobody else can with the information you have provided. I believe that part of the confusion is that there are two distinct tests that are run at different times. 1) What should we be working on if we did a task switch now? This detects that cases where we need to trigger a task switch immediately because of a potential missed deadline. This has many triggers that are typically spaced well apart. File Download Complete, Server RPC complete, Detach, Project Suspend, Project Resume, Task Suspend, Task Resume, Task Complete, Task Abort, X time after the last previous event... None of these happen that often. 2) Should we do a task switch now? This one gets run extremely frequently as one of the triggers is a checkpoint. The enforcement routine is then supposed to check to see if there are any tasks that have gone over their time segment and can be swapped out normally. The checkpoint trigger was put in place so that tasks would not lose a large amount of processing time due to being swapped out just before a checkpoint was to occur. Another trigger is the "what should we be working on now" detecting a deadline problem. What we need to do is to figure out which of these two is causing the problem. I am open hearing about problems with the algorithm, however, you keep hammering on about one place that makes no sense at all. jm7 "Paul D. Buck" <p.d.b...@comcast .net> To Sent by: [email protected] boinc_dev-bounces cc @ssl.berkeley.edu TarotApprentice <[email protected]>, BOINC dev <[email protected]>, 04/27/2009 02:30 [email protected] PM Subject Re: [boinc_dev] 6.6.20 and work scheduling On Apr 27, 2009, at 10:14 AM, [email protected] wrote: > As long as you insist on talking about the frequency of the test, > people > are going to be ignoring you. Please start talking about what is > wrong > with the test itself. Fixing the test will fix the problem. No > amount of > tinkering with the frequency of the test is going to fix the problem. > > The project came and went already. It was doing document indexing. > The > runtimes were very short, but the transfer times were killing it. Which proves my point. The deadlines were unrealistic. Yes, most of the real issue are that the rules that make the test bad. But that is not the sole problem here. The frequency means that I cannot help you troubleshoot the rules because I have hundreds to thousands of calls to the routines that are just so much wasted time. This buries the bad calls is so much garbage that I cannot find that needle that is needed to fix the tests. And, just because you don't think the frequency of the tests, or even Dr. Anderson not thinking the frequency of the test is a problem does not mean that it is not a problem. Which *IS* also one of the problems in the BOINC world. We ignore people that ask questions we don't want asked. We avoid opinions that don't comport with ours ... So, we had a project that had an unrealistic deadline and that we put into place this rule and because that one project had a mythical need and that means we now cannot change BOINC for the better? What is wrong with the test is that we do it too often. We also use the wrong driving parameters. Because we do it so often, and keep no history, we have instability in the scheduling system and no pretending that the frequency does not matter is not going to make it more stable. Even if you fix the rules the fact that the client is recalculating the deadlines every 10 seconds (or less) means that BOINC is going to change its mind as to what to run. Because we also don't enforce TSI ... This is not a simple one minor butlet and we are done ... You cannot, or will not, see the frequency caused instability unless you have a system that is both fast and wide. As best as I can tell you have neither, nor does UCB, though they are welcome to drive over anytime to look at mine (2 hours or so from UCB, and I will buy lunch and pay for the gas). Ok, we fix all other problems but still check every 10 seconds on which tasks to run. If we do not enforce TSI, meaning, you cannot switch a task out until it has completed its TSI or ended (a rule you also say should not be enforced), that means that assuming that I have a batch of tasks that are from a project, all have roughly the same deadlines, well are we not going to enforce "keep work mix interesting"? Then I am going to run that as a big batch which will cause task abandonment ... oh, and because we are keeping the event driven basis that means that the task I started because a task ended is still going to be superseded by another task when the upload ends ... leading to more tasks abandoned partly done ... Essentially you want to fix the problem without changing any of the drivers of the problem... one of which is the event base triggers ... which happen far too often... and pretending that they don't won't make it less of a problem. So, ignore me some more if you want, why not, everybody else does... still does not mean that I am wrong ... I first reported this problem in 2005 or there abouts ... it is still a problem ... and it will continue to be a problem unless you stop clinging to "I don't think doing it once a second is a problem, so it cannot be a problem" mindset. Even it we change the rules to better ones the fact that fast systems run the tests so often are still going to be unstable. BOINC ignores history ... that and fast repeats of any test is a recipe for instability ... I agree that changing the frequency is not going to solve this, but maybe it will allow me to help provide the data so we can solve the rest of the problems. And changing the frequency of the tests will make the system a little less unstable. Oh, and save compute time. Oh, one more thing, running the test every 60 seconds with a 2 minute task means I would still make the deadlines... no need to run the test RIGHT NOW ... 30 seconds later would not be a killer ... even for a mythical need ... _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address. _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
