I have read the code in great detail. The first loop is an attempt to
initialize a variable to a known state. The state is changed later as
needed.
jm7
"Paul D. Buck"
<p.d.b...@comcast
.net> To
BOINC Developers Mailing List
04/30/2009 12:05 <[email protected]>
PM cc
David Anderson
<[email protected]>,
[email protected]
Subject
Re: [boinc_dev] 6.6.20 and work
scheduling
On Apr 30, 2009, at 5:48 AM, [email protected] wrote:
jm7
1) We do it too often (event driven)
Exactly what we are not listening to. The rate of tests is NOT the
reason
for incorrect switches.
No, but it is the reason we have difficulty finding them. And it is a
source of instability. John, you can stick your head in the sand all you
want, it will not make the problems go away because you refuse to see them.
Also, because you consider "globally" and the universe is different every
time you recalculate, because you also recalculate based on the situation
as it is NOW, and that situation evolves if for no other reason than work
is done in the mean time, well, if you are doing that 10 times a minute you
are going to get 10 different answers. Those answers MAY be close enough
that no change is needed based on the rules as they are, but, coupled with
the other limitations this is an issue.
I did show a very specific example of this effect where task A completes, B
starts, A uploads and completes the upload, B suspended, C started, Another
task D completes, E started, D's upload completes, E Suspended and F is
started.
The last item I will remind you once again. I may not be able to walk
straight anymore, and I sometime have trouble talking, but, I am a trained
and skilled systems engineer. This is what I used to do. I know I cannot
put my finger on a line in a log to convince you or anyone else, But, this
is a problem. It is a problem because it loads up the logs with unneeded
entries and it is also a cause of some of the instability we see.
Anyone that works with unstable systems knows that bumping an unstable
system causes problems, the more you bump it the faster those problems
arise.
2) All currently running tasks are eligible for preemption
Not completely true, and not the problem. Tasks that are not in the
list
of want to run are preemptable, tasks that are in the list of want to
run
are preemptable. They should only be preempted if either that task
is past
its TSI, or there is a task with deadline trouble (please work with
me on
the definition of deadline trouble).
Which means you have not looked at the code. The first loop in the code
marks the next state of ALL running tasks as preempted. Dr. Anderson made
a change that was supposed to cure that, but it does not.
3) TSI is not respected as a limiting factor
It cannot be in all cases. There may be more cases where the TSI
could be
honored.
For the reason above, it is not honored at all. I have pointed to the
block of code where all tasks are marked for preemption and that my friend
means that TSI is not considered at all ...
Again, you are thinking in terms of single stream systems and on those I
agree that this is the case. On multi-core systems it is much less of an
issue to the point where it might never be an issue at all.
8 Core system
all tasks running are 8 Hours in length
Average time between task completions: 1 hour
Assuming that the system has been running for awhile that is what
statistics tells me.
With the mix of task lengths I see on my systems the situation is usually
much better than that. See the numbers below. One of my first posts I
actually listed the numbers of tasks and the run times ... but the nubmers
below are illustrative enough.
4) TSI is used in calculating deadline peril
And it has to be. Since tasks may (or may not) be re-scheduled at
all
during a TSI, and the TSI may line up badly with a connection, the
TSI is
an important part of the calculation.
Example:
12 hour TSI.
1 hour CPU time left on the task.
12 hours and 1 second left before deadline.
No events for the next 12 hours.
Without TSI in the calculation, there is the distinct possibility
that
there is no deadline trouble recorded.
Wait 12 hours.
You how have 1 second wall time left and 1 hour CPU time left. Your
task
is now late.
With TSI in the calculation.
Deadline trouble is noted at the point 12 hours and 1 second before
deadline (if not somewhat earlier depending on other load). The task
gets
started and completes before deadline.
Proving once again you are thinking of systems that are running a single
processing stream. I suppose that you forgot my last test where you did
not want to read the numbers. Or the test before that. In the first test
the average time to completion between tasks run off was 6 minutes
(measured over 24 hours), on the other test there were:
Request CPU reschedule: 3 11 14 22 19
handle_finished_apps
In a three hour period. Those numbers were for a 4, 4, 8, 4, 8 CPU system
respectfully. Over a 3 hour period. Meaning that the time between a
completed task and the next was at worst 60 minutes and at best about 8
minutes apart (6 minutes for the first test). Your theory falls apart
because when the next task completes the pending task can be picked up and
scheduled next.
We are not talking about scheduling problems on single core systems. It
would be nice if you would keep that in mind. We are talking about the use
of parameters to control the scheduling that were developed on single
thread systems being inappropriate on multi-core systems.
5) Work mix is not kept "interesting"
6) Resource Share is used in calculating run time allocations
A simulation that tracks what the machine is likely do actually do
has to
track what happens based on resource share. It may not want to be
the
trigger for instant preemption though.
Sadly it does do that right now, trigger preemption at the slightest
breeze. Last night I had 5 uFluids tasks all running in parallel because
the scheduler decided that the deadline of 5/13 could not be met. It ran
those tasks for several hours before I suspended most of them. Later it
suspended the one it was still running and late last night I unsuspended
all of them again. They are STILL waiting to be restarted. Because they
have deadlines that are close the mechanisms used to "globally" calculate
will always select these tasks in batches and screw up the work mix, which
means that my i7 is run in a mode that is significantly less efficient.
This is also why I have proposed other metrics and rules to make these
decisions to lower the driving by Resource Share on the selection process.
7) Work "batches" (tasks with roughly similar deadlines) are
not "bank
teller queued"
I really don't understand this one. A bank teller queue means that
tasks
come from one queue and are spread across the available resources as
they
become available. Are they always run in FIFO? No. However, that
does
not mean that they are not coming from the same queue.
Probably because you keep refusing to read what I write carefully. See the
example above. If you schedule "globally" as you so love, then tasks with
close deadlines and relatively low Resource Shares will always cause these
panics. I get them for IBERCIVIS, VTU, and just recently uFluids
8) History of work scheduling is not preserved and all
deadlines are
calculated fresh each invocation.
Please explain why this is a problem? The history of work scheduling
may
have no bearing on what has to happen in the future.
See above. It also leads to other instabilities that you don't want to
recognize. When I re-enabled the uFluid tasks that were such a cause for
panic yesterday it sure would seem that it should be a cause for panic
today. I have a NQueens task that was suspended yesterday with 12 minutes
to run and it still has not restarted. If it was so important to start
yesterday to run up to that point, why, 24 hours later has BOINC been
running off task from projects that it has just downloaded work from that
have deadlines that are later?
9) True deadline peril is rare, but "false positives" are
common
Methods that defer leaving RR for a long time will increase true
deadline
peril. What is needed is something in between.
Again, the systems of which we speak tend to be completing tasks fast
enough that this argument makes no sense. With resources coming free in
minutes, on average, there is no chance that this is going to be as common
as you posit. Again, and again, you are thinking of the old slow systems
and when you refuse to consider the evidence that people like Richard and I
supply, well ...
I know it is harder to see on a 4 core system. Though I did notice these
issues in 2005 after I had gotten my first 4 CPU system (the first two of
the test above), but, you can see it if you watch the patterns of
operation.
10) Some of the sources of work peril may be caused by a
defective
work fetch allocation
Please give examples from logs.
I don't have to. You have described over and over again why every
suggested change cannot work because of these very issues. Go back and
look at your examples. Virtually all your examples involve BOINC
downloading work that all of a sudden causes this magical situation where I
have to madly start processing the new work because BOINC fetched something
that causes the world to change. Ergo, if BOINC had not fetched that work,
the problem would not have occurred and the universe would not be ending.
Even so, many of those examples of panics are still modeled on only having
a single stream of work processing.
11) Other factors either obscured by the above, I forgot them,
or
maybe nothing else ...
work-fetch decisions
Seems to be related to:
1) Bad debt calculations
2) Asking for inappropriate work loads
3) asking for inappropriate amounts
Please give examples.
I have, any number of times.
I could send you another long log showing that the CUDA debt is slowly
building and in another 24 hours or so is going to be so out of whack that
the client is going to stop asking for work from GPU Grid the only project
from which GPU work can be fetched, and BOINC is still happily ignoring
all evidence to the contrary and trying to get CUDA work from every other
project in the universe and pouting because it cannot get it. There is the
Rosetta guy who cannot get a queue full of Rosetta work because of the
opposite problem (he is only attached to GPU Grid and Rosetta), there is
Richard's logs where he needs one class of work in one part and the work
fetch asks for the wrong kind of work.
Others have mentioned this before, but the next is where I ask for 1 second
of work and instead of getting one task I get 10 or more, or even more than
one. This is a long standing problem and the issue is on the server end,
but, it is still a problem
4) Design of client / server interactions
There are design constraints that limit the transaction to one round
trip.
Actually they are design choices. And they may or may not be the best
choices. One of the recent examples and questions was why we feed up the
list of tasks to the server each time. Another design choice. The server
is supposed to use that information to make a good choice to feed work
down. If I understand the other proposal made recently changes could be
made to this exchange that might be beneficial. Changes which you have
also rejected out of hand.
bad debt calculation
Seems to be related to:
1) Assuming that all projects have CUDA work and asking for it
2) Assuming that a CUDA only project has CPU work and asking
for it.
3) Not necessarily taking into account system width correctly
I don't understand what you mean by system width.
More modern systems are faster, they are also "wider" with more processing
units. My i7 has 12 with 8 virtual CPUs and 4 GPU engines. I am actively
considering a system with 16 CPUs with room for as many as 6 or 8 GPU cores
which could bring that number up to 24 elements. As I have been struggling
to get through that this changes the way work can be processed I have been
using this term, a lot. Which tells me yet again that you have not
actually been carefully reading what I have been writing.
I know it is a PITA to read things carefully, but, I am not wordy out of
spite, but to be as clear as possible. Skimming proposals looking only for
reasons to reject them is not actually that helpful.
4) Not taking into account CUDA capability correctly
efficiency
of the scheduling calculations (if it's an issue)
It is, but you and other nay-sayers don't have systems that
experience
the issues so, you and others denigrate or ignore the reports.
Fix the algorithm FIRST, optimize SECOND.
Reducing the hit rate is not intended to be done to optimize anything.
Sadly this is a point that I know I will never be able to prove to your
satisfaction, and it is apparent that I cannot explain it well though I
have tried very hard to do so. But, even with a perfect rule set, the
system will retain the characteristic of instability if we keep calling the
scheduler at times when there is no specific need. I get why some of those
calls are made, but, the way we proceed from there is the secondary cause.
And when I suggest that there may not be specific needs you have made
examples time and again where work is downloaded and for some reason cannot
quite grasp the fact that in most cases waiting for 30 seconds before we
check to see how the schedule might be affected by this new work insist
that the world is magically better if I check it instantaneously. With no
evidence I might add. Even you defunct project with 5 minute deadlines
would only be affected if the tasks took 4 minutes and 59 seconds ... which
means they would also blow the deadlines because of the latency in uploads
and downloads. If the task were a reasonable 1 minute in length then the
only effect of waiting to schedule the task by 30 seconds would be to trim
the margin slightly.
But the more cogent point is you are offering a straw-man argument using a
project that essentially collapsed because they had unreasonable
requirements. So, why are we coding BOINC to handle unreasonable
requirements from a project that does not exist anymore? That is a poser I
cannot fathom.
The fact that reducing the call rate has the side effect of increasing
efficiency is a nice side effect. But it is not the reason I have proposed
it, and I wish you would stop pretending that it is.
In either case, the two main reasons to reduce the call rate are:
a) to lower the log clutter
b) reduce the rate of false changes so they are easier to identify
Your intransigence on this matter is nothing short of amazing. You
complain about the large logs that obscure the very problems we are hunting
and yet denigrate the one way we can start to get a handle on that very
issue.
The worse point is that to identify some of the problems
requires
logging, because we do, for example, resource scheduling so
often the
logs get so big they are not usable because of the sheer size
because
we are performing actions that ARE NOT NECESSARY ... because
the
assumption is that there is no cost. But, here is a cost right
here.
If we do resource scheduling 10 times more often than needed
then
there is 10 times more data to sift. Which is the main reason
I have
harped on SLOWING THIS DOWN.
It is also why in my pseudo-code proposal I suggested that we
do two
things, one, make it switchable so that we can start with a
bare bones
"bank teller" style queuing system and only add refinements as
we see
where it does not work adequately. Let us not add more rules
than
needed. Start with the simplest rule set possible, run it,
find
exceptions, figure out why, fix those, move on ...
In other words step back 5 years. We were there, and we had to add
refinements to get it to work.
See, that is the way we fixed it then, why are you so resistant to this
approach now? Back then the most common system was single core, with some
duals. And, as I point out that was the time I started to notice these
issues on my 4 core system. Those issues were not handled back then and
they are worse now ...
So lets try a new mechanism for the wide systems with as few rules as
possible and see if it works. If we can create situations where it starts
to fail, well, then we add complexity.
I suspect that many of the rules we have now will not be needed at all. In
fact, I think that much of the complexity can go away because now we can
make choices that are not at all possible on single processing thread
machines.
Let us not throw the baby out with the bath water.
If the baby is dead, why not?
The problem is fundamentally that we developed elaborate rules to handle
scheduling on single processing thread machines. Duals made some of those
rules passe but the effects were almost unnoticible. The effects started
to become visible on 4 core systems and are now quite obvious on wider
systems.
This is one reason in my psuedo code I suggested that at least for the time
being we keep the current scheduler for systems of less than 4 cores and
try something new on the 4 and wider systems.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.