Apparently, not. I am willing to do some work, but Dave hasn't replied to
any emails discussing this or the other CPU scheduler changes I am willing
to work on.
jm7
Richard
Haselgrove
<r.haselgr...@bti To
nternet.com> "David Anderson"
Sent by: <[email protected]>
<boinc_dev-bounce cc
[email protected] [email protected]
u> Subject
Re: [boinc_dev] Preemption of very
short tasks.
03/18/2010 08:27
PM
Hopefully we're getting close to a push on per-app-version issues.
Another one came up at AQUA this evening.
They have apps with wildly differing runtimes: I switched a host tonight
from Fokker-Planck (under 50 minutes) to Adiabatic (preliminary prediction
8
days) - both CPU tasks, no coprocessor apps active at the moment.
For research purposes, they like to get their results back quickly, so make
extensive use of <max_wus_in_progress>. But to cover their planned 12-hour
server maintenance window this weekend, I would have had to cache 15 or so
FP tasks. They can't allow that for the 8-day tasks.
So <max_wus_in_progress> is another candidate for migration to
per-app-version, please.
----- Original Message -----
From: "David Anderson" <[email protected]>
To: "Richard Haselgrove" <[email protected]>
Cc: <[email protected]>; <[email protected]>
Sent: Wednesday, January 06, 2010 7:15 PM
Subject: Re: [boinc_dev] Preemption of very short tasks.
> The "temp DCF" change doesn't address the following problem.
> The current plan is to keep track of per-app-version DCF on the server;
> I hope get to this in the next couple of months.
> -- David
>
> Richard Haselgrove wrote:
>> There's also a converse problem if a project supplies too-small job FLOP
>> counts, in that EDF may not be invoked soon enough: this particularly
>> applies if a long, under-estimated task follows a succession of shorter
>> and/or better estimated tasks.
>>
>> We first saw this clearly with the introduction of Astropulse under the
>> s...@home banner. Many people use optimised sah applications: indeed,
the
>> stock sah application has incorporated many optimisations over the
years,
>> meaning that the sah stock job FLOP counts are routinely too big
>> (typically by a factor of ~5 for modern Core2 CPUs, leading to DCF
values
>> of ~0.2). Full CPU optimisation can double the effect, leading to a DCF
>> of ~0.1. If a succession of such tasks is followed by an
>> accurately-estimated AP task ("too small", in the context of the
>> over-estimated tasks which preceded it), BOINC will assume that the
>> following task will complete much sooner than will be the case in
>> reality. In the case of the initial release of Astropulse (when no
>> comparable optimisations were available), I seem to remember that BOINC
>> would form an estimate that the tasks would take ~10 hours on a Core2,
>> when in reality they would take ~40 hours.
>>
>> Of course, as soom as an Astropulse task completed, DCF would be reset
>> and new estimates calculated, but by then BOINC could have got itself
>> into serious work over-fetch trouble. A single-project SETI cruncher
with
>> a 10-day cache setting (not an unknown animal!), caching AP tasks on the
>> basis of the 10-hour estimate, could find themselves with a 40-day cache
>> as soon as DCF corrected itself, and no way of completing them all
within
>> deadline, EDF or not.
>>
>> The '"temp DCF" for the app version' envisioned by changeset 20077 will
>> be of some help in this kind of situation, because it should start to
>> inhibit work fetch as soon as a task seems to be outstaying the initial
>> estimate (something I think I've suggested in the past). But it isn't
>> going to work in the sah_enh / AP case (different apps) if the 'temp
DCF'
>> is maintained by app_version, while the permanent DCF is still kept at
>> the project level. The scope for the termp and permanent DCFs has to be
>> the same: ideally both app_version.
>>
>> ----- Original Message ----- From: "David Anderson"
>> <[email protected]>
>> To: <[email protected]>
>> Cc: <[email protected]>
>> Sent: Wednesday, January 06, 2010 6:25 AM
>> Subject: Re: [boinc_dev] Preemption of very short tasks.
>>
>>
>>> Several recent posts have described the same scenario:
>>> a project supplies too-large job FLOP counts.
>>> Its jobs are projected to miss deadline, and start off in EDF.
>>> As their fraction done increases, their completion
>>> estimates improve and they no longer miss deadline.
>>> They're preempted and other jobs from the project are started.
>>> Soon there are lots of partly-finished jobs.
>>>
>>> I checked in a change that should fix this.
>>> The basic idea: information from running jobs is used to scale the
>>> completion estimates of unstarted jobs.
>>>
>>> -- David
>>>
>>> [email protected] wrote:
>>>> I am attached to GoldBach's conjecture which is running some very
short
>>>> tasks (~2 minutes). I have a large number of these that have been
>>>> pre-empted at around 1:55. I believe that what is happening is that
>>>> Goldbach's is asked for work, and provides some. The tasks
immediately
>>>> enter EDF. Since it is a dual CPU system, 2 of Goldbach's tasks are
>>>> started at the same time. When the one of these two finishes, two
>>>> other
>>>> tasks are marked as requiring EDF and the one with only seconds
>>>> remaining
>>>> is then pre-empted. More tasks for Goldbach's are downloaded, and run
>>>> with
>>>> some of these also being suspended. This is leading to a rather large
>>>> collection of mostly run tasks that will not be gotten to for a week
or
>>>> so
>>>> more as they only have seconds left and the deadline is much later.
>>>> The
>>>> new tasks keep the STD low enough so that these that have very little
>>>> time
>>>> left are unlikely to complete in normal Round Robin, but will have to
>>>> wait
>>>> until the deadline to start the last few seconds (the safety margin
was
>>>> removed even though upload and report are not 0 time, they are being
>>>> treated as such). This is leading to many more tasks in the queue
than
>>>> should be there.
>>>>
>>>> There are a couple of solutions:
>>>>
>>>> 1) Treat tasks with the same deadline in lexicographical order - even
>>>> if
>>>> some of them are marked as EDF and others are not.
>>>> 2) If the rr_sim indicates a potential miss, let the tasks run out
>>>> their
>>>> current time slice unless a test of EDF completion also indicates a
>>>> potential deadline miss.
>>>>
>>>> Either of these would allow the tasks that are mostly done to
complete,
>>>> and
>>>> allow them to be uploaded and reported. This allows them to be
>>>> uploaded
>>>> and reported, which reduces the risk of hitting a major slowdown in
the
>>>> UI
>>>> because of too many tasks on the client.
>>>>
>>>> jm7
>>>>
>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>>
>>
>
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.