On Apr 25, 2009, at 1:48 AM, Richard Haselgrove wrote:

> Paul D. Buck wrote
>
>> Again, the point being that we still have the issue were BOINC  
>> continually changes its mind about what should be running.
>>
>> One of the causes is that we call the schedulling routine so  
>> often.   But that is not all of it.  Even if we do throttle the  
>> number of calls  to this routine it will not stop the switching  
>> because the deadline  rules are still skee-woof, but it will help  
>> reduce the onset slightly.
>
> Again, I disagree.
>
> 'Request CPU reschedule'. doesn't (shouldn't) automatically mean  
> that a change will happen - I snipped many occurrences from my logs  
> where the result was NOP. 'Request CPU reschedule'.calls a TEST to  
> see IF a reschedule is necessary/appropropriate. Jiggling with the  
> number of times the test is run won't make the slightest difference  
> if the test itself is flawed.

You are correct, if the test is wrong or the governing conditions are  
wrong the timing of the test will not significantly alter things.   
However, I have watched tasks do in and out of work in seconds to  
minutes and the only thing that really changed was that some of these  
events happened.  If we were not triggering the test so often the bad  
tests would not be run so often.  Won't stop the bad decisions,  
but ... I am still trying to find someone that can explain why running  
these tests every 60 seconds makes so much sense.  And if that makes  
sense, why does running in every 10-20 seconds make more sense...

The point I am stressing is that there is really only one time that  
reschedule MUST in all cases be run ... and that is when a resource  
goes idle.  All other tests / cycles are mostly a waste of time.

> I was intriqued by the "Idle state changed" case, because I see the  
> effect on my P4/v5.10.13. Not every time, but enough to be  
> noticeable. On this system, I've never noticed a timeslice run short  
> - it's always been at least an hour, and then some.
>
> That suggests that on a slower, single-core, machine, 'Request CPU  
> reschedule'.happens, if anything, too rarely. If there are no task  
> exits, work fetches, or file downloads, then the TEST doesn't run.  
> And it seems as if "Checkpoint reached" only triggers "Request  
> enforce CPU schedule" - for the current schedule, presumably.

I could wish ...

Even on my slow Xeons I see tasks that never get their full slice ...  
if they did they would never have gone into suspension.

There are two issues here, and they create a "perfect storm" giving  
rise to the effects we see...

a) The tests are run far too often, and there is no floor to the  
number of times per minute this reschedule is run ... and for the life  
of me I can't even see why we are running it more than once every 5  
minutes.

and,

b) the deadline calculations are cranked.  For one thing, if you have  
a long TSI as I do, which if honored would not allow partially done  
tasks unless they took more than 12 hours to complete, this biases the  
system so that it considers far too many tasks in deadline trouble.   
There may be other issues with these calculations.  The problem is  
that I can see them on the screen but cannot find the "proof" if you  
will in the logs.

Fundamentally I don't think the calculations properly work for "fast"  
and "wide" systems.

If I have a 4 core system and all 4 of the currently running tasks are  
going to complete in the next 6 hours, the tasks that are due in 24  
hours and have 30 minute run times have zero need to run RIGHT NOW ...  
but BOINC running the schedule tests will do just that ...  what is  
the point of calculating DCF if we are not going to use it to certify  
the estimates of run times ...
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to