I checked in a fix that isn't exactly either of these. -- David
On 02-Nov-2010 7:05 AM, john.mcl...@sybase.com wrote: > Yes, it does happen with 6.12. I believe what is happening is that a high > priority task is first in priority, and a multi CPU task that takes all > CPUs on the device is second in priority. There are several possible > fixes: > > 1) Have the function that orders the tasks by priority order all of the > tasks by sorting by the criteria needed to order the tasks. The function > that picks the tasks to run would then skip over any tasks that use more > resources than available. This would also fix RAM over usage and other > resource allocation issues that are not just the count. This option also > allows tasks to change what resources they need on the fly if we decide to > do it (this should not be that hard to implement). > > 2) Have the function that orders the tasks by priority skip over tasks > that use more resources than are available. Possibly slightly easier to > implement in the short term, but probably less useful overall. > > jm7 > > > > David Anderson > <da...@ssl.berkel > ey.edu> To > Sent by: boinc_dev@ssl.berkeley.edu > <boinc_dev-bounce cc > s...@ssl.berkeley.ed > u> Subject > Re: [boinc_dev] Initial scheduling > checkin > 11/01/2010 06:05 > PM > > > > > > > > > I checked in a fix for 1). > > Does 2) happen with 6.12? > If not, let's just wait for 6.12. > > -- David > > On 01-Nov-2010 7:18 AM, Richard Haselgrove wrote: >> I agree with John - this is a major change, and will need extensive > testing. >> >> David, may I ask what your current expectation is for the timeline for > the >> new scheduler? Specifically, are you going to attempt to incorporate it > into >> v6.12, or would it be better to get all the 'notices' angst out of the > way >> via a public release (and debug if necessary after BETA testing byt the >> public at large) first, and then we can concentrate all resources on >> functionality? >> >> I'm concerned that there seem to be a couple of recently-reported issues >> which might slip through the cracks. >> >> 1) In v6.12.4, the thrashing of GPU tasks into and out of GPU memory, >> because there seems to be no 'Task Switch Interval' inhibition on the new >> GPU scheduling by debt. >> >> 2) In v6.10.58, the idle CPUs apparently caused by the scheduler > incorrectly >> handling the triple mixture of High Priority CPU / Multithread / ordinary >> priority single CPU tasks. >> (from http://boinc.berkeley.edu/dev/forum_thread.php?id=6138) >> >> If the new scheduler is going to be put into v6.12 (which will inevitably >> delay that release a bit), could those fixes (when ready) be backported > into >> v6.10, please? >> >> Or if the new scheduler has to wait for v6.14, perhaps we should > concentrate >> on getting v6.12 finished first? >> >> >> >>> This needs to be tested thoroughly with some very long term simulations >>> involving several years of simulation time. Anything that involves the >>> concept of recent for work fetch will break resource share over the long >>> term when used in conjunction with CPDN. >>> >>> jm7 >>> >>> >>> When replying, please edit your Subject line so it is more specific >>> than "Re: Contents of boinc_cvs digest..." >>> >>> >>> Today's Topics: >>> >>> 1. r22608 - in trunk/boinc: . api client lib sched >>> (boinc...@ssl.berkeley.edu) >>> >>> >>> ---------------------------------------------------------------------- >>> >>> Message: 1 >>> Date: Fri, 29 Oct 2010 16:41:35 -0700 >>> From: boinc...@ssl.berkeley.edu >>> Subject: [boinc_cvs] r22608 - in trunk/boinc: . api client lib sched >>> To: boinc_...@ssl.berkeley.edu >>> Message-ID:<201010292341.o9tnfzuh013...@mail.ssl.berkeley.edu> >>> Content-Type: text/plain; charset=UTF-8 >>> >>> Author: davea >>> Date: 2010-10-29 16:41:34 -0700 (Fri, 29 Oct 2010) >>> New Revision: 22608 >>> >>> Modified: >>> trunk/boinc/api/boinc_api.cpp >>> trunk/boinc/checkin_notes >>> trunk/boinc/client/client_types.cpp >>> trunk/boinc/client/cpu_sched.cpp >>> trunk/boinc/client/net_stats.cpp >>> trunk/boinc/client/work_fetch.h >>> trunk/boinc/lib/util.cpp >>> trunk/boinc/lib/util.h >>> trunk/boinc/sched/credit.cpp >>> trunk/boinc/sched/update_stats.cpp >>> Log: >>> - client: small initial checkin for new scheduling system. >>> Keep track of per-project recent estimated credit >>> >>> >>> Modified: trunk/boinc/api/boinc_api.cpp >>> =================================================================== >>> --- trunk/boinc/api/boinc_api.cpp 2010-10-29 18:58:26 UTC (rev >>> 22607) >>> +++ trunk/boinc/api/boinc_api.cpp 2010-10-29 23:41:34 UTC (rev >>> 22608) >>> @@ -835,9 +835,9 @@ >>> #else >>> strcpy(abspath, path); >>> #endif >>> - argv[0] = GRAPHICS_APP_FILENAME; >>> + argv[0] = (char*)GRAPHICS_APP_FILENAME; >>> if (fullscreen) { >>> - argv[1] = "--fullscreen"; >>> + argv[1] = (char*)"--fullscreen"; >>> argv[2] = 0; >>> argc = 2; >>> } else { >>> >>> Modified: trunk/boinc/checkin_notes >>> =================================================================== >>> --- trunk/boinc/checkin_notes 2010-10-29 18:58:26 UTC (rev 22607) >>> +++ trunk/boinc/checkin_notes 2010-10-29 23:41:34 UTC (rev 22608) >>> @@ -7660,3 +7660,20 @@ >>> client_msgs.cpp >>> clientgui/ >>> NoticeListCtrl.cpp >>> + >>> +David 29 Oct 2010 >>> + - client: small initial checkin for new scheduling system. >>> + Keep track of per-project recent estimated credit >>> + >>> + api/ >>> + boinc_api.cpp >>> + client/ >>> + client_types.cpp >>> + cpu_sched.cpp >>> + net_stats.cpp >>> + work_fetch.h >>> + lib/ >>> + util.cpp,h >>> + sched/ >>> + credit.cpp >>> + update_stats.cpp >>> >>> Modified: trunk/boinc/client/client_types.cpp >>> =================================================================== >>> --- trunk/boinc/client/client_types.cpp 2010-10-29 18:58:26 UTC >>> (rev 22607) >>> +++ trunk/boinc/client/client_types.cpp 2010-10-29 23:41:34 UTC >>> (rev 22608) >>> @@ -202,6 +202,8 @@ >>> if (parse_bool(buf, "dont_request_more_work", >>> dont_request_more_work)) continue; >>> if (parse_bool(buf, "detach_when_done", detach_when_done)) >>> continue; >>> if (parse_bool(buf, "ended", ended)) continue; >>> + if (parse_double(buf, "<rec>", pwf.rec)) continue; >>> + if (parse_double(buf, "<rec_time>", pwf.rec_time)) continue; >>> if (parse_double(buf, "<short_term_debt>", >>> cpu_pwf.short_term_debt)) continue; >>> if (parse_double(buf, "<long_term_debt>", > cpu_pwf.long_term_debt)) >>> continue; >>> if (parse_double(buf, "<cpu_backoff_interval>", >>> cpu_pwf.backoff_interval)) continue; >>> @@ -275,6 +277,8 @@ >>> "<master_fetch_failures>%d</master_fetch_failures>\n" >>> "<min_rpc_time>%f</min_rpc_time>\n" >>> "<next_rpc_time>%f</next_rpc_time>\n" >>> + "<rec>%f</rec>\n" >>> + "<rec_time>%f</rec_time>\n" >>> "<short_term_debt>%f</short_term_debt>\n" >>> "<long_term_debt>%f</long_term_debt>\n" >>> "<cpu_backoff_interval>%f</cpu_backoff_interval>\n" >>> @@ -314,6 +318,8 @@ >>> master_fetch_failures, >>> min_rpc_time, >>> next_rpc_time, >>> + pwf.rec, >>> + pwf.rec_time, >>> cpu_pwf.short_term_debt, >>> cpu_pwf.long_term_debt, cpu_pwf.backoff_interval, >>> cpu_pwf.backoff_time, >>> cuda_pwf.short_term_debt, cuda_pwf.long_term_debt, >>> >>> Modified: trunk/boinc/client/cpu_sched.cpp >>> =================================================================== >>> --- trunk/boinc/client/cpu_sched.cpp 2010-10-29 18:58:26 UTC >>> (rev 22607) >>> +++ trunk/boinc/client/cpu_sched.cpp 2010-10-29 23:41:34 UTC >>> (rev 22608) >>> @@ -514,6 +514,33 @@ >>> debt_interval_start = now; >>> } >>> >>> +#define REC_HALF_LIFE (30*86400) >>> + >>> +// update REC (recent estimated credit) >>> +// >>> +static void update_rec() { >>> + double f = gstate.host_info.p_fpops; >>> + >>> + for (unsigned int i=0; i<gstate.projects.size(); i++) { >>> + PROJECT* p = gstate.projects[i]; >>> + double x = p->cpu_pwf.secs_this_debt_interval * f; >>> + if (gstate.host_info.have_cuda()) { >>> + x += p->cuda_pwf.secs_this_debt_interval * f * >>> cuda_work_fetch.relative_speed; >>> + } >>> + if (gstate.host_info.have_ati()) { >>> + x += p->ati_pwf.secs_this_debt_interval * f * >>> ati_work_fetch.relative_speed; >>> + } >>> + update_average( >>> + gstate.now, >>> + gstate.debt_interval_start, >>> + x, >>> + REC_HALF_LIFE, >>> + p->pwf.rec, >>> + p->pwf.rec_time >>> + ); >>> + } >>> +} >>> + >>> // adjust project debts (short, long-term) >>> // >>> void CLIENT_STATE::adjust_debts() { >>> @@ -551,6 +578,8 @@ >>> work_fetch.accumulate_inst_sec(atp, elapsed_time); >>> } >>> >>> + update_rec(); >>> + >>> cpu_work_fetch.update_long_term_debts(); >>> cpu_work_fetch.update_short_term_debts(); >>> if (host_info.have_cuda()) { >>> >>> Modified: trunk/boinc/client/net_stats.cpp >>> =================================================================== >>> --- trunk/boinc/client/net_stats.cpp 2010-10-29 18:58:26 UTC >>> (rev 22607) >>> +++ trunk/boinc/client/net_stats.cpp 2010-10-29 23:41:34 UTC >>> (rev 22608) >>> @@ -71,6 +71,7 @@ >>> } >>> double start_time = gstate.now - dt; >>> update_average( >>> + gstate.now, >>> start_time, >>> nbytes, >>> NET_RATE_HALF_LIFE, >>> >>> Modified: trunk/boinc/client/work_fetch.h >>> =================================================================== >>> --- trunk/boinc/client/work_fetch.h 2010-10-29 18:58:26 UTC (rev >>> 22607) >>> +++ trunk/boinc/client/work_fetch.h 2010-10-29 23:41:34 UTC (rev >>> 22608) >>> @@ -237,6 +237,10 @@ >>> bool can_fetch_work; >>> bool compute_can_fetch_work(PROJECT*); >>> bool has_runnable_jobs; >>> + double rec; >>> + // recent estimated credit >>> + double rec_time; >>> + // when it was last updated >>> PROJECT_WORK_FETCH() { >>> memset(this, 0, sizeof(*this)); >>> } >>> >>> Modified: trunk/boinc/lib/util.cpp >>> =================================================================== >>> --- trunk/boinc/lib/util.cpp 2010-10-29 18:58:26 UTC (rev 22607) >>> +++ trunk/boinc/lib/util.cpp 2010-10-29 23:41:34 UTC (rev 22608) >>> @@ -234,6 +234,7 @@ >>> // html/inc/credit.inc >>> // >>> void update_average( >>> + double now, >>> double work_start_time, // when new work was started >>> // (or zero if no new work) >>> double work, // amount of new work >>> @@ -241,8 +242,6 @@ >>> double& avg, // average work per day (in and > out) >>> double& avg_time // when average was last computed >>> ) { >>> - double now = dtime(); >>> - >>> if (avg_time) { >>> // If an average R already exists, imagine that the new work > was >>> done >>> // entirely between avg_time and now. >>> >>> Modified: trunk/boinc/lib/util.h >>> =================================================================== >>> --- trunk/boinc/lib/util.h 2010-10-29 18:58:26 UTC (rev 22607) >>> +++ trunk/boinc/lib/util.h 2010-10-29 23:41:34 UTC (rev 22608) >>> @@ -61,7 +61,7 @@ >>> extern double linux_cpu_time(int pid); >>> #endif >>> >>> -extern void update_average(double, double, double, double&, double&); >>> +extern void update_average(double, double, double, double, double&, >>> double&); >>> >>> extern int boinc_calling_thread_cpu_time(double&); >>> >>> >>> Modified: trunk/boinc/sched/credit.cpp >>> =================================================================== >>> --- trunk/boinc/sched/credit.cpp 2010-10-29 18:58:26 UTC (rev >>> 22607) >>> +++ trunk/boinc/sched/credit.cpp 2010-10-29 23:41:34 UTC (rev >>> 22608) >>> @@ -55,10 +55,12 @@ >>> DB_TEAM team; >>> int retval; >>> char buf[256]; >>> + double now = dtime(); >>> >>> // first, process the host >>> >>> update_average( >>> + now, >>> start_time, credit, CREDIT_HALF_LIFE, >>> host.expavg_credit, host.expavg_time >>> ); >>> @@ -76,6 +78,7 @@ >>> } >>> >>> update_average( >>> + now, >>> start_time, credit, CREDIT_HALF_LIFE, >>> user.expavg_credit, user.expavg_time >>> ); >>> @@ -103,6 +106,7 @@ >>> return retval; >>> } >>> update_average( >>> + now, >>> start_time, credit, CREDIT_HALF_LIFE, >>> team.expavg_credit, team.expavg_time >>> ); >>> @@ -799,6 +803,7 @@ >>> int write_modified_app_versions(vector<DB_APP_VERSION>& app_versions) { >>> unsigned int i, j; >>> int retval = 0; >>> + double now = dtime(); >>> >>> if (config.debug_credit&& app_versions.size()) { >>> log_messages.printf(MSG_NORMAL, >>> @@ -827,6 +832,7 @@ >>> } >>> for (j=0; j<av.credit_samples.size(); j++) { >>> update_average( >>> + now, >>> av.credit_times[j], av.credit_samples[j], >>> CREDIT_HALF_LIFE, >>> av.expavg_credit, av.expavg_time >>> ); >>> >>> Modified: trunk/boinc/sched/update_stats.cpp >>> =================================================================== >>> --- trunk/boinc/sched/update_stats.cpp 2010-10-29 18:58:26 UTC >>> (rev 22607) >>> +++ trunk/boinc/sched/update_stats.cpp 2010-10-29 23:41:34 UTC >>> (rev 22608) >>> @@ -54,6 +54,7 @@ >>> DB_USER user; >>> int retval; >>> char buf[256]; >>> + double now = dtime(); >>> >>> while (1) { >>> retval = user.enumerate("where expavg_credit>0.1"); >>> @@ -66,7 +67,9 @@ >>> } >>> >>> if (user.expavg_time> update_time_cutoff) continue; >>> - update_average(0, 0, CREDIT_HALF_LIFE, user.expavg_credit, >>> user.expavg_time); >>> + update_average( >>> + now, 0, 0, CREDIT_HALF_LIFE, user.expavg_credit, >>> user.expavg_time >>> + ); >>> sprintf( buf, "expavg_credit=%f, expavg_time=%f", >>> user.expavg_credit, user.expavg_time >>> ); >>> @@ -84,6 +87,7 @@ >>> DB_HOST host; >>> int retval; >>> char buf[256]; >>> + double now = dtime(); >>> >>> while (1) { >>> retval = host.enumerate("where expavg_credit>0.1"); >>> @@ -96,7 +100,9 @@ >>> } >>> >>> if (host.expavg_time> update_time_cutoff) continue; >>> - update_average(0, 0, CREDIT_HALF_LIFE, host.expavg_credit, >>> host.expavg_time); >>> + update_average( >>> + now, 0, 0, CREDIT_HALF_LIFE, host.expavg_credit, >>> host.expavg_time >>> + ); >>> sprintf( >>> buf,"expavg_credit=%f, expavg_time=%f", >>> host.expavg_credit, host.expavg_time >>> @@ -142,6 +148,7 @@ >>> DB_TEAM team; >>> int retval; >>> char buf[256]; >>> + double now = dtime(); >>> >>> while (1) { >>> retval = team.enumerate("where expavg_credit>0.1"); >>> @@ -163,7 +170,10 @@ >>> continue; >>> } >>> if (team.expavg_time< update_time_cutoff) { >>> - update_average(0, 0, CREDIT_HALF_LIFE, team.expavg_credit, >>> team.expavg_time); >>> + update_average( >>> + now, 0, 0, CREDIT_HALF_LIFE, team.expavg_credit, >>> + team.expavg_time >>> + ); >>> } >>> sprintf( >>> buf, "expavg_credit=%f, expavg_time=%f, nusers=%d", >>> >>> >>> >>> ------------------------------ >>> >>> _______________________________________________ >>> boinc_cvs mailing list >>> boinc_...@ssl.berkeley.edu >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_cvs >>> >>> >>> End of boinc_cvs Digest, Vol 71, Issue 48 >>> ***************************************** >>> >>> >>> >>> _______________________________________________ >>> boinc_dev mailing list >>> boinc_dev@ssl.berkeley.edu >>> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >>> To unsubscribe, visit the above URL and >>> (near bottom of page) enter your email address. >>> >> >> >> _______________________________________________ >> boinc_dev mailing list >> boinc_dev@ssl.berkeley.edu >> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev >> To unsubscribe, visit the above URL and >> (near bottom of page) enter your email address. > _______________________________________________ > boinc_dev mailing list > boinc_dev@ssl.berkeley.edu > http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev > To unsubscribe, visit the above URL and > (near bottom of page) enter your email address. > > > _______________________________________________ boinc_dev mailing list boinc_dev@ssl.berkeley.edu http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.