I checked in a fix that isn't exactly either of these.
-- David

On 02-Nov-2010 7:05 AM, john.mcl...@sybase.com wrote:
> Yes, it does happen with 6.12.  I believe what is happening is that a high
> priority task is first in priority, and a multi CPU task that takes all
> CPUs on the device is second in priority.  There are several possible
> fixes:
>
> 1)  Have the function that orders the tasks by priority order all of the
> tasks by sorting by the criteria needed to order the tasks.  The function
> that picks the tasks to run would then skip over any tasks that use more
> resources than available.  This would also fix RAM over usage and other
> resource allocation issues that are not just the count.  This option also
> allows tasks to change what resources they need on the fly if we decide to
> do it (this should not be that hard to implement).
>
> 2)  Have the function that orders the tasks by priority skip over tasks
> that use more resources than are available.  Possibly slightly easier to
> implement in the short term, but probably less useful overall.
>
> jm7
>
>
>
>               David Anderson
>               <da...@ssl.berkel
>               ey.edu>                                                     To
>               Sent by:                  boinc_dev@ssl.berkeley.edu
>               <boinc_dev-bounce                                          cc
>               s...@ssl.berkeley.ed
>               u>                                                     Subject
>                                         Re: [boinc_dev] Initial scheduling
>                                         checkin
>               11/01/2010 06:05
>               PM
>
>
>
>
>
>
>
>
> I checked in a fix for 1).
>
> Does 2) happen with 6.12?
> If not, let's just wait for 6.12.
>
> -- David
>
> On 01-Nov-2010 7:18 AM, Richard Haselgrove wrote:
>> I agree with John - this is a major change, and will need extensive
> testing.
>>
>> David, may I ask what your current expectation is for the timeline for
> the
>> new scheduler? Specifically, are you going to attempt to incorporate it
> into
>> v6.12, or would it be better to get all the 'notices' angst out of the
> way
>> via a public release (and debug if necessary after BETA testing byt the
>> public at large) first, and then we can concentrate all resources on
>> functionality?
>>
>> I'm concerned that there seem to be a couple of recently-reported issues
>> which might slip through the cracks.
>>
>> 1) In v6.12.4, the thrashing of GPU tasks into and out of GPU memory,
>> because there seems to be no 'Task Switch Interval' inhibition on the new
>> GPU scheduling by debt.
>>
>> 2) In v6.10.58, the idle CPUs apparently caused by the scheduler
> incorrectly
>> handling the triple mixture of High Priority CPU / Multithread / ordinary
>> priority single CPU tasks.
>> (from http://boinc.berkeley.edu/dev/forum_thread.php?id=6138)
>>
>> If the new scheduler is going to be put into v6.12 (which will inevitably
>> delay that release a bit), could those fixes (when ready) be backported
> into
>> v6.10, please?
>>
>> Or if the new scheduler has to wait for v6.14, perhaps we should
> concentrate
>> on getting v6.12 finished first?
>>
>>
>>
>>> This needs to be tested thoroughly with some very long term simulations
>>> involving several years of simulation time.  Anything that involves the
>>> concept of recent for work fetch will break resource share over the long
>>> term when used in conjunction with CPDN.
>>>
>>> jm7
>>>
>>>
>>> When replying, please edit your Subject line so it is more specific
>>> than "Re: Contents of boinc_cvs digest..."
>>>
>>>
>>> Today's Topics:
>>>
>>>     1. r22608 - in trunk/boinc: . api client lib sched
>>>        (boinc...@ssl.berkeley.edu)
>>>
>>>
>>> ----------------------------------------------------------------------
>>>
>>> Message: 1
>>> Date: Fri, 29 Oct 2010 16:41:35 -0700
>>> From: boinc...@ssl.berkeley.edu
>>> Subject: [boinc_cvs] r22608 - in trunk/boinc: . api client lib sched
>>> To: boinc_...@ssl.berkeley.edu
>>> Message-ID:<201010292341.o9tnfzuh013...@mail.ssl.berkeley.edu>
>>> Content-Type: text/plain; charset=UTF-8
>>>
>>> Author: davea
>>> Date: 2010-10-29 16:41:34 -0700 (Fri, 29 Oct 2010)
>>> New Revision: 22608
>>>
>>> Modified:
>>>     trunk/boinc/api/boinc_api.cpp
>>>     trunk/boinc/checkin_notes
>>>     trunk/boinc/client/client_types.cpp
>>>     trunk/boinc/client/cpu_sched.cpp
>>>     trunk/boinc/client/net_stats.cpp
>>>     trunk/boinc/client/work_fetch.h
>>>     trunk/boinc/lib/util.cpp
>>>     trunk/boinc/lib/util.h
>>>     trunk/boinc/sched/credit.cpp
>>>     trunk/boinc/sched/update_stats.cpp
>>> Log:
>>> - client: small initial checkin for new scheduling system.
>>>      Keep track of per-project recent estimated credit
>>>
>>>
>>> Modified: trunk/boinc/api/boinc_api.cpp
>>> ===================================================================
>>> --- trunk/boinc/api/boinc_api.cpp          2010-10-29 18:58:26 UTC (rev
>>> 22607)
>>> +++ trunk/boinc/api/boinc_api.cpp          2010-10-29 23:41:34 UTC (rev
>>> 22608)
>>> @@ -835,9 +835,9 @@
>>> #else
>>>           strcpy(abspath, path);
>>> #endif
>>> -        argv[0] = GRAPHICS_APP_FILENAME;
>>> +        argv[0] = (char*)GRAPHICS_APP_FILENAME;
>>>           if (fullscreen) {
>>> -            argv[1] = "--fullscreen";
>>> +            argv[1] = (char*)"--fullscreen";
>>>               argv[2] = 0;
>>>               argc = 2;
>>>           } else {
>>>
>>> Modified: trunk/boinc/checkin_notes
>>> ===================================================================
>>> --- trunk/boinc/checkin_notes        2010-10-29 18:58:26 UTC (rev 22607)
>>> +++ trunk/boinc/checkin_notes        2010-10-29 23:41:34 UTC (rev 22608)
>>> @@ -7660,3 +7660,20 @@
>>>                           client_msgs.cpp
>>>               clientgui/
>>>                           NoticeListCtrl.cpp
>>> +
>>> +David  29 Oct 2010
>>> +    - client: small initial checkin for new scheduling system.
>>> +        Keep track of per-project recent estimated credit
>>> +
>>> +    api/
>>> +        boinc_api.cpp
>>> +    client/
>>> +        client_types.cpp
>>> +        cpu_sched.cpp
>>> +        net_stats.cpp
>>> +        work_fetch.h
>>> +    lib/
>>> +        util.cpp,h
>>> +    sched/
>>> +        credit.cpp
>>> +        update_stats.cpp
>>>
>>> Modified: trunk/boinc/client/client_types.cpp
>>> ===================================================================
>>> --- trunk/boinc/client/client_types.cpp          2010-10-29 18:58:26 UTC
>>> (rev 22607)
>>> +++ trunk/boinc/client/client_types.cpp          2010-10-29 23:41:34 UTC
>>> (rev 22608)
>>> @@ -202,6 +202,8 @@
>>>           if (parse_bool(buf, "dont_request_more_work",
>>> dont_request_more_work)) continue;
>>>           if (parse_bool(buf, "detach_when_done", detach_when_done))
>>> continue;
>>>           if (parse_bool(buf, "ended", ended)) continue;
>>> +        if (parse_double(buf, "<rec>", pwf.rec)) continue;
>>> +        if (parse_double(buf, "<rec_time>", pwf.rec_time)) continue;
>>>           if (parse_double(buf, "<short_term_debt>",
>>> cpu_pwf.short_term_debt)) continue;
>>>           if (parse_double(buf, "<long_term_debt>",
> cpu_pwf.long_term_debt))
>>> continue;
>>>           if (parse_double(buf, "<cpu_backoff_interval>",
>>> cpu_pwf.backoff_interval)) continue;
>>> @@ -275,6 +277,8 @@
>>>           "<master_fetch_failures>%d</master_fetch_failures>\n"
>>>           "<min_rpc_time>%f</min_rpc_time>\n"
>>>           "<next_rpc_time>%f</next_rpc_time>\n"
>>> +        "<rec>%f</rec>\n"
>>> +        "<rec_time>%f</rec_time>\n"
>>>           "<short_term_debt>%f</short_term_debt>\n"
>>>           "<long_term_debt>%f</long_term_debt>\n"
>>>           "<cpu_backoff_interval>%f</cpu_backoff_interval>\n"
>>> @@ -314,6 +318,8 @@
>>>           master_fetch_failures,
>>>           min_rpc_time,
>>>           next_rpc_time,
>>> +        pwf.rec,
>>> +        pwf.rec_time,
>>>           cpu_pwf.short_term_debt,
>>>           cpu_pwf.long_term_debt, cpu_pwf.backoff_interval,
>>> cpu_pwf.backoff_time,
>>>           cuda_pwf.short_term_debt, cuda_pwf.long_term_debt,
>>>
>>> Modified: trunk/boinc/client/cpu_sched.cpp
>>> ===================================================================
>>> --- trunk/boinc/client/cpu_sched.cpp             2010-10-29 18:58:26 UTC
>>> (rev 22607)
>>> +++ trunk/boinc/client/cpu_sched.cpp             2010-10-29 23:41:34 UTC
>>> (rev 22608)
>>> @@ -514,6 +514,33 @@
>>>       debt_interval_start = now;
>>> }
>>>
>>> +#define REC_HALF_LIFE (30*86400)
>>> +
>>> +// update REC (recent estimated credit)
>>> +//
>>> +static void update_rec() {
>>> +    double f = gstate.host_info.p_fpops;
>>> +
>>> +    for (unsigned int i=0; i<gstate.projects.size(); i++) {
>>> +        PROJECT* p = gstate.projects[i];
>>> +        double x = p->cpu_pwf.secs_this_debt_interval * f;
>>> +        if (gstate.host_info.have_cuda()) {
>>> +            x += p->cuda_pwf.secs_this_debt_interval * f *
>>> cuda_work_fetch.relative_speed;
>>> +        }
>>> +        if (gstate.host_info.have_ati()) {
>>> +            x += p->ati_pwf.secs_this_debt_interval * f *
>>> ati_work_fetch.relative_speed;
>>> +        }
>>> +        update_average(
>>> +            gstate.now,
>>> +            gstate.debt_interval_start,
>>> +            x,
>>> +            REC_HALF_LIFE,
>>> +            p->pwf.rec,
>>> +            p->pwf.rec_time
>>> +        );
>>> +    }
>>> +}
>>> +
>>> // adjust project debts (short, long-term)
>>> //
>>> void CLIENT_STATE::adjust_debts() {
>>> @@ -551,6 +578,8 @@
>>>           work_fetch.accumulate_inst_sec(atp, elapsed_time);
>>>       }
>>>
>>> +    update_rec();
>>> +
>>>       cpu_work_fetch.update_long_term_debts();
>>>       cpu_work_fetch.update_short_term_debts();
>>>       if (host_info.have_cuda()) {
>>>
>>> Modified: trunk/boinc/client/net_stats.cpp
>>> ===================================================================
>>> --- trunk/boinc/client/net_stats.cpp             2010-10-29 18:58:26 UTC
>>> (rev 22607)
>>> +++ trunk/boinc/client/net_stats.cpp             2010-10-29 23:41:34 UTC
>>> (rev 22608)
>>> @@ -71,6 +71,7 @@
>>>       }
>>>       double start_time = gstate.now - dt;
>>>       update_average(
>>> +        gstate.now,
>>>           start_time,
>>>           nbytes,
>>>           NET_RATE_HALF_LIFE,
>>>
>>> Modified: trunk/boinc/client/work_fetch.h
>>> ===================================================================
>>> --- trunk/boinc/client/work_fetch.h        2010-10-29 18:58:26 UTC (rev
>>> 22607)
>>> +++ trunk/boinc/client/work_fetch.h        2010-10-29 23:41:34 UTC (rev
>>> 22608)
>>> @@ -237,6 +237,10 @@
>>>       bool can_fetch_work;
>>>       bool compute_can_fetch_work(PROJECT*);
>>>       bool has_runnable_jobs;
>>> +    double rec;
>>> +        // recent estimated credit
>>> +    double rec_time;
>>> +        // when it was last updated
>>>       PROJECT_WORK_FETCH() {
>>>           memset(this, 0, sizeof(*this));
>>>       }
>>>
>>> Modified: trunk/boinc/lib/util.cpp
>>> ===================================================================
>>> --- trunk/boinc/lib/util.cpp         2010-10-29 18:58:26 UTC (rev 22607)
>>> +++ trunk/boinc/lib/util.cpp         2010-10-29 23:41:34 UTC (rev 22608)
>>> @@ -234,6 +234,7 @@
>>> // html/inc/credit.inc
>>> //
>>> void update_average(
>>> +    double now,
>>>       double work_start_time,       // when new work was started
>>>                                       // (or zero if no new work)
>>>       double work,                    // amount of new work
>>> @@ -241,8 +242,6 @@
>>>       double&   avg,                    // average work per day (in and
> out)
>>>       double&   avg_time                // when average was last computed
>>> ) {
>>> -    double now = dtime();
>>> -
>>>       if (avg_time) {
>>>           // If an average R already exists, imagine that the new work
> was
>>> done
>>>           // entirely between avg_time and now.
>>>
>>> Modified: trunk/boinc/lib/util.h
>>> ===================================================================
>>> --- trunk/boinc/lib/util.h           2010-10-29 18:58:26 UTC (rev 22607)
>>> +++ trunk/boinc/lib/util.h           2010-10-29 23:41:34 UTC (rev 22608)
>>> @@ -61,7 +61,7 @@
>>> extern double linux_cpu_time(int pid);
>>> #endif
>>>
>>> -extern void update_average(double, double, double, double&, double&);
>>> +extern void update_average(double, double, double, double, double&,
>>> double&);
>>>
>>> extern int boinc_calling_thread_cpu_time(double&);
>>>
>>>
>>> Modified: trunk/boinc/sched/credit.cpp
>>> ===================================================================
>>> --- trunk/boinc/sched/credit.cpp           2010-10-29 18:58:26 UTC (rev
>>> 22607)
>>> +++ trunk/boinc/sched/credit.cpp           2010-10-29 23:41:34 UTC (rev
>>> 22608)
>>> @@ -55,10 +55,12 @@
>>>       DB_TEAM team;
>>>       int retval;
>>>       char buf[256];
>>> +    double now = dtime();
>>>
>>>       // first, process the host
>>>
>>>       update_average(
>>> +        now,
>>>           start_time, credit, CREDIT_HALF_LIFE,
>>>           host.expavg_credit, host.expavg_time
>>>       );
>>> @@ -76,6 +78,7 @@
>>>       }
>>>
>>>       update_average(
>>> +        now,
>>>           start_time, credit, CREDIT_HALF_LIFE,
>>>           user.expavg_credit, user.expavg_time
>>>       );
>>> @@ -103,6 +106,7 @@
>>>               return retval;
>>>           }
>>>           update_average(
>>> +            now,
>>>               start_time, credit, CREDIT_HALF_LIFE,
>>>               team.expavg_credit, team.expavg_time
>>>           );
>>> @@ -799,6 +803,7 @@
>>> int write_modified_app_versions(vector<DB_APP_VERSION>&   app_versions) {
>>>       unsigned int i, j;
>>>       int retval = 0;
>>> +    double now = dtime();
>>>
>>>       if (config.debug_credit&&   app_versions.size()) {
>>>           log_messages.printf(MSG_NORMAL,
>>> @@ -827,6 +832,7 @@
>>>               }
>>>               for (j=0; j<av.credit_samples.size(); j++) {
>>>                   update_average(
>>> +                    now,
>>>                       av.credit_times[j], av.credit_samples[j],
>>> CREDIT_HALF_LIFE,
>>>                       av.expavg_credit, av.expavg_time
>>>                   );
>>>
>>> Modified: trunk/boinc/sched/update_stats.cpp
>>> ===================================================================
>>> --- trunk/boinc/sched/update_stats.cpp           2010-10-29 18:58:26 UTC
>>> (rev 22607)
>>> +++ trunk/boinc/sched/update_stats.cpp           2010-10-29 23:41:34 UTC
>>> (rev 22608)
>>> @@ -54,6 +54,7 @@
>>>       DB_USER user;
>>>       int retval;
>>>       char buf[256];
>>> +    double now = dtime();
>>>
>>>       while (1) {
>>>           retval = user.enumerate("where expavg_credit>0.1");
>>> @@ -66,7 +67,9 @@
>>>           }
>>>
>>>           if (user.expavg_time>   update_time_cutoff) continue;
>>> -        update_average(0, 0, CREDIT_HALF_LIFE, user.expavg_credit,
>>> user.expavg_time);
>>> +        update_average(
>>> +            now, 0, 0, CREDIT_HALF_LIFE, user.expavg_credit,
>>> user.expavg_time
>>> +        );
>>>           sprintf( buf, "expavg_credit=%f, expavg_time=%f",
>>>               user.expavg_credit, user.expavg_time
>>>           );
>>> @@ -84,6 +87,7 @@
>>>       DB_HOST host;
>>>       int retval;
>>>       char buf[256];
>>> +    double now = dtime();
>>>
>>>       while (1) {
>>>           retval = host.enumerate("where expavg_credit>0.1");
>>> @@ -96,7 +100,9 @@
>>>           }
>>>
>>>           if (host.expavg_time>   update_time_cutoff) continue;
>>> -        update_average(0, 0, CREDIT_HALF_LIFE, host.expavg_credit,
>>> host.expavg_time);
>>> +        update_average(
>>> +            now, 0, 0, CREDIT_HALF_LIFE, host.expavg_credit,
>>> host.expavg_time
>>> +        );
>>>           sprintf(
>>>               buf,"expavg_credit=%f, expavg_time=%f",
>>>               host.expavg_credit, host.expavg_time
>>> @@ -142,6 +148,7 @@
>>>       DB_TEAM team;
>>>       int retval;
>>>       char buf[256];
>>> +    double now = dtime();
>>>
>>>       while (1) {
>>>           retval = team.enumerate("where expavg_credit>0.1");
>>> @@ -163,7 +170,10 @@
>>>               continue;
>>>           }
>>>           if (team.expavg_time<   update_time_cutoff) {
>>> -            update_average(0, 0, CREDIT_HALF_LIFE, team.expavg_credit,
>>> team.expavg_time);
>>> +            update_average(
>>> +                now, 0, 0, CREDIT_HALF_LIFE, team.expavg_credit,
>>> +                team.expavg_time
>>> +            );
>>>           }
>>>           sprintf(
>>>               buf, "expavg_credit=%f, expavg_time=%f, nusers=%d",
>>>
>>>
>>>
>>> ------------------------------
>>>
>>> _______________________________________________
>>> boinc_cvs mailing list
>>> boinc_...@ssl.berkeley.edu
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_cvs
>>>
>>>
>>> End of boinc_cvs Digest, Vol 71, Issue 48
>>> *****************************************
>>>
>>>
>>>
>>> _______________________________________________
>>> boinc_dev mailing list
>>> boinc_dev@ssl.berkeley.edu
>>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>>> To unsubscribe, visit the above URL and
>>> (near bottom of page) enter your email address.
>>>
>>
>>
>> _______________________________________________
>> boinc_dev mailing list
>> boinc_dev@ssl.berkeley.edu
>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
>> To unsubscribe, visit the above URL and
>> (near bottom of page) enter your email address.
> _______________________________________________
> boinc_dev mailing list
> boinc_dev@ssl.berkeley.edu
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
>
>
>
_______________________________________________
boinc_dev mailing list
boinc_dev@ssl.berkeley.edu
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to