Re: [boinc_dev] Preemption of very short tasks.

John . McLeod Fri, 19 Mar 2010 05:41:51 -0700

Apparently, not.  I am willing to do some work, but Dave hasn't replied to
any emails discussing this or the other CPU scheduler changes I am willing
to work on.


jm7


                                                                           
             Richard                                                       
             Haselgrove                                                    
             <r.haselgr...@bti                                          To 
             nternet.com>              "David Anderson"                    
             Sent by:                  <[email protected]>            
             <boinc_dev-bounce                                          cc 
             [email protected]         [email protected]          
             u>                                                    Subject 
                                       Re: [boinc_dev] Preemption of very  
                                       short tasks.                        
             03/18/2010 08:27                                              
             PM                                                            
                                                                           
                                                                           
                                                                           
                                                                           




Hopefully we're getting close to a push on per-app-version issues.

Another one came up at AQUA this evening.

They have apps with wildly differing runtimes: I switched a host tonight
from Fokker-Planck (under 50 minutes) to Adiabatic (preliminary prediction
8
days) - both CPU tasks, no coprocessor apps active at the moment.

For research purposes, they like to get their results back quickly, so make

extensive use of <max_wus_in_progress>. But to cover their planned 12-hour
server maintenance window this weekend, I would have had to cache 15 or so
FP tasks. They can't allow that for the 8-day tasks.

So <max_wus_in_progress> is another candidate for migration to
per-app-version, please.


----- Original Message -----
From: "David Anderson" <[email protected]>
To: "Richard Haselgrove" <[email protected]>
Cc: <[email protected]>; <[email protected]>
Sent: Wednesday, January 06, 2010 7:15 PM
Subject: Re: [boinc_dev] Preemption of very short tasks.


> The "temp DCF" change doesn't address the following problem.
> The current plan is to keep track of per-app-version DCF on the server;
> I hope get to this in the next couple of months.
> -- David
>
> Richard Haselgrove wrote:
>> There's also a converse problem if a project supplies too-small job FLOP

>> counts, in that EDF may not be invoked soon enough: this particularly
>> applies if a long, under-estimated task follows a succession of shorter
>> and/or better estimated tasks.
>>
>> We first saw this clearly with the introduction of Astropulse under the
>> s...@home banner. Many people use optimised sah applications: indeed,
the
>> stock sah application has incorporated many optimisations over the
years,
>> meaning that the sah stock job FLOP counts are routinely too big
>> (typically by a factor of ~5 for modern Core2 CPUs, leading to DCF
values
>> of ~0.2). Full CPU optimisation can double the effect, leading to a DCF
>> of ~0.1. If a succession of such tasks is followed by an
>> accurately-estimated AP task ("too small", in the context of the
>> over-estimated tasks which preceded it), BOINC will assume that the
>> following task will complete much sooner than will be the case in
>> reality. In the case of the initial release of Astropulse (when no
>> comparable optimisations were available), I seem to remember that BOINC
>> would form an estimate that the tasks would take ~10 hours on a Core2,
>> when in reality they would take ~40 hours.
>>
>> Of course, as soom as an Astropulse task completed, DCF would be reset
>> and new estimates calculated, but by then BOINC could have got itself
>> into serious work over-fetch trouble. A single-project SETI cruncher
with
>> a 10-day cache setting (not an unknown animal!), caching AP tasks on the

>> basis of the 10-hour estimate, could find themselves with a 40-day cache

>> as soon as DCF corrected itself, and no way of completing them all
within
>> deadline, EDF or not.
>>
>> The '"temp DCF" for the app version' envisioned by changeset 20077 will
>> be of some help in this kind of situation, because it should start to
>> inhibit work fetch as soon as a task seems to be outstaying the initial
>> estimate (something I think I've suggested in the past). But it isn't
>> going to work in the sah_enh / AP case (different apps) if the 'temp
DCF'
>> is maintained by app_version, while the permanent DCF is still kept at
>> the project level. The scope for the termp and permanent DCFs has to be
>> the same: ideally both app_version.
>>
>> ----- Original Message ----- From: "David Anderson"
>> <[email protected]>
>> To: <[email protected]>
>> Cc: <[email protected]>
>> Sent: Wednesday, January 06, 2010 6:25 AM
>> Subject: Re: [boinc_dev] Preemption of very short tasks.
>>
>>
>>> Several recent posts have described the same scenario:
>>> a project supplies too-large job FLOP counts.
>>> Its jobs are projected to miss deadline, and start off in EDF.
>>> As their fraction done increases, their completion
>>> estimates improve and they no longer miss deadline.
>>> They're preempted and other jobs from the project are started.
>>> Soon there are lots of partly-finished jobs.
>>>
>>> I checked in a change that should fix this.
>>> The basic idea: information from running jobs is used to scale the
>>> completion estimates of unstarted jobs.
>>>
>>> -- David
>>>
>>> [email protected] wrote:
>>>> I am attached to GoldBach's conjecture which is running some very
short
>>>> tasks (~2 minutes).  I have a large number of these that have been
>>>> pre-empted at around 1:55.  I believe that what is happening is that
>>>> Goldbach's is asked for work, and provides some.  The tasks
immediately
>>>> enter EDF.  Since it is a dual CPU system, 2 of Goldbach's tasks are
>>>> started at the same time.  When the one of these two finishes, two
>>>> other
>>>> tasks are marked as requiring EDF and the one with only seconds
>>>> remaining
>>>> is then pre-empted.  More tasks for Goldbach's are downloaded, and run

>>>> with
>>>> some of these also being suspended.  This is leading to a rather large
>>>> collection of mostly run tasks that will not be gotten to for a week
or
>>>> so
>>>> more as they only have seconds left and the deadline is much later.
>>>> The
>>>> new tasks keep the STD low enough so that these that have very little
>>>> time
>>>> left are unlikely to complete in normal Round Robin, but will have to
>>>> wait
>>>> until the deadline to start the last few seconds (the safety margin
was
>>>> removed even though upload and report are not 0 time, they are being
>>>> treated as such).  This is leading to many more tasks in the queue
than
>>>> should be there.
>>>>
>>>> There are a couple of solutions:
>>>>
>>>> 1)  Treat tasks with the same deadline in lexicographical order - even

>>>> if
>>>> some of them are marked as EDF and others are not.
>>>> 2)  If the rr_sim indicates a potential miss, let the tasks run out
>>>> their
>>>> current time slice unless a test of EDF completion also indicates a
>>>> potential deadline miss.
>>>>
>>>> Either of these would allow the tasks that are mostly done to
complete,
>>>> and
>>>> allow them to be uploaded and reported.  This allows them to be
>>>> uploaded
>>>> and reported, which reduces the risk of hitting a major slowdown in
the
>>>> UI
>>>> because of too many tasks on the client.
>>>>
>>>> jm7
>>>>
>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> [email protected]
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>>
>>
>
>


_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.



_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] Preemption of very short tasks.

Reply via email to