It can also happen if the Short Term Debts are such that the long job gets
run late in the disconnected period. I don't believe that Round Robin is
the correct method at all. A much simpler solution is to just sort the
tasks by estimated duration into CPU buckets. This is much simpler than
the round robin simulation.
jm7
David Anderson
<[email protected]
ey.edu> To
Sent by: [email protected]
<boinc_dev-bounce cc
[email protected]
u> Subject
Re: [boinc_dev] Worst case for run
order and work fetch.
09/30/2010 04:49
PM
Currently we use RR simulation for 2 things:
1) to identify jobs that need to run EDF (for scheduling)
2) to compute device shortfall (for work fetch)
The problem here, seems to me, is that in some cases
(e.g., if lots of jobs run EDF)
RR sim is not the right thing to use for 2),
because the schedule it simulates may differ widely from the actual
schedule.
A possible solution: to compute shortfall,
we first do an RR simulation to identify EDF jobs.
Then we do an "EDF/RR simulation" that models the way
the way the real scheduler works, i.e. it gives priority to EDF jobs.
I'll think about how to do this.
It's possible that an EDF/RR simulator can be implemented by
passing a flag to the existing RR simulator,
and adding a few lines of code to do EDF jobs first.
-- David
On 30-Sep-2010 7:06 AM, [email protected] wrote:
>
> I have been having difficulties recently with under fetching work. The
> numbers look OK, but the actual run order has been leaving one out of 2
> CPUs dry with a half day to the next connection. I have a simplified
> example (note that a CPU is idle for 12 out or 24 hours in this
scenario):
>
> 24 hours between connections. Connect every X is 24 hours.
> 2 CPUs
>
> At the beginning of the 24 hours:
>
> Project A
> Task A1: 200 hours of run time
>
> Project B 24 hour deadlines
> Task B1: 1 hour of run time remaining.
> Task B2: 1 hour of run time remaining.
> Task B3: 1 hour of run time remaining.
> Task B4: 1 hour of run time remaining.
> Task B5: 1 hour of run time remaining.
> Task B6: 1 hour of run time remaining.
> Task B7: 1 hour of run time remaining.
> Task B8: 1 hour of run time remaining.
> Task B9: 1 hour of run time remaining.
> Task B10: 1 hour of run time remaining.
> Task B11: 1 hour of run time remaining.
> Task B12: 1 hour of run time remaining.
> Task B13: 1 hour of run time remaining.
> Task B14: 1 hour of run time remaining.
> Task B15: 1 hour of run time remaining.
> Task B16: 1 hour of run time remaining.
> Task B17: 1 hour of run time remaining.
> Task B18: 1 hour of run time remaining.
> Task B19: 1 hour of run time remaining.
> Task B20: 1 hour of run time remaining.
> Task B21: 1 hour of run time remaining.
> Task B22: 1 hour of run time remaining.
> Task B23: 1 hour of run time remaining.
> Task B24: 1 hour of run time remaining.
>
> The actual run order will be:
>
> Time 0:
> Task B1 starts in EDF.
> Task B2 starts in EDF.
>
> Time 1 hr:
> Task B1 completes.
> Task B2 completes.
> Task B3 starts in EDF.
> Task B4 starts in EDF.
>
> Time 2 hr.
> Task B3 completes.
> Task B4 completes.
> Task B5 starts in EDF.
> Task B6 starts in EDF.
>
> Time 3 hr.
> Task B5 completes.
> Task B6 completes.
> Task B7 starts in EDF.
> Task B8 starts in EDF.
>
> Time 4 hr.
> Task B7 completes.
> Task B8 completes.
> Task B9 starts in EDF.
> Task B10 starts in EDF.
>
> Time 5 hr.
> Task B9 completes.
> Task B10 completes.
> Task B11 starts in EDF.
> Task B12 starts in EDF.
>
> Time 6 hr.
> Task B11 completes.
> Task B12 completes.
> Task B13 starts in EDF.
> Task B14 starts in EDF.
>
> Time 7 hr.
> Task B13 completes.
> Task B14 completes.
> Task B15 starts in EDF.
> Task B16 starts in EDF.
>
> Time 8 hr.
> Task B15 completes.
> Task B16 completes.
> Task B17 starts in EDF.
> Task B18 starts in EDF.
>
> Time 9 hr.
> Task B17 completes.
> Task B18 completes.
> Task B19 start in EDF.
> Task B20 starts in EDF.
>
> Tima 10 hr.
> Task B19 completes.
> Task B20 completes.
> Task B21 starts in EDF.
> Task B22 starts in EDF.
>
> Time 11 hr.
> Task B21 completes.
> Task B22 completes.
> Task B23 starts in EDF.
> Task B24 starts in EDF.
>
> Time 12 hr.
> Task B23 completes.
> Task B24 completes.
> Task A1 starts.
> CPU 2 IDLE.
>
> Time 13 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 14 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 15 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 15 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 16 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 17 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 18 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 19 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 20 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 21 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 22 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 23 hr.
> Task A1 continues.
>
> Time 24 hr.
> More work is downloaded from some project.
>
> Note that this can still happen with "extra work" set to greater than 1
day
> as "extra work" is correctly ignored for projects that are overworked.
>
> The correct method of calculating work fetch is to assume the worst
> possible ordering of work run to meet the duration that the machine will
> not be connected to the internet. Unfortunately, packing a series of
> knapsacks is NP complete, so a heuristic must be used instead of a search
> for either the best or worst possible packing. The worst packing that I
> can find that is easy to program is by remaining runtime with the
shortest
> runtime remaining put into each bin (CPU). So If the work fetch
algorithm
> had done this calculation, it would have requested more work. Let us see
> how that would work:
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> A1 IDLE 13hr
> A1 IDLE 14hr
> A1 IDLE 15hr
> A1 IDLE 16hr
> A1 IDLE 17hr
> A1 IDLE 18hr
> A1 IDLE 19hr
> A1 IDLE 20hr
> A1 IDLE 21hr
> A1 IDLE 22hr
> A1 IDLE 23hr
> A1 IDLE 24hr
>
> Work fetch would notice the potential for 12 hours of idle CPU time and
> request this work from some project. Since project B is already in EDF,
it
> should not be the first choice. If A has some work available it would
look
> like (A tasks come as large chunks):
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> A1 A2 13hr
> A1 A2 14hr
> A1 A2 15hr
> A1 A2 16hr
> A1 A2 17hr
> A1 A2 18hr
> A1 A2 19hr
> A1 A2 20hr
> A1 A2 21hr
> A1 A2 22hr
> A1 A2 23hr
> A1 A2 24hr
>
> Work fetch would be complete.
>
> However, if A is offline and work can only be fetched from B, we would
have
> an additional 12 hours of B tasks downloaded.
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> B25 B26 13hr
> B27 B28 14hr
> B29 B30 15hr
> B31 B32 16hr
> B33 B34 17hr
> B35 B36 18hr
> A1 IDLE 19hr
> A1 IDLE 20hr
> A1 IDLE 21hr
> A1 IDLE 22hr
> A1 IDLE 23hr
> A1 IDLE 24hr
>
> This still leaves 6 hours of idle CPU time, so more work needs to be
> fetched from some where. If it is from A, great. If from B, it would
look
> like:
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> B25 B26 13hr
> B27 B28 14hr
> B29 B30 15hr
> B31 B32 16hr
> B33 B34 17hr
> B35 B36 18hr
> B37 B38 19hr
> B39 B40 20hr
> B41 B42 21hr
> A1 IDLE 22hr
> A1 IDLE 23hr
> A1 IDLE 24hr
>
> There are still 3 idle CPU hours, so work fetch would happen again, and
if
> from B would look like:
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> B25 B26 13hr
> B27 B28 14hr
> B29 B30 15hr
> B31 B32 16hr
> B33 B34 17hr
> B35 B36 18hr
> B37 B38 19hr
> B39 B40 20hr
> B41 B42 21hr
> B43 B44 22hr
> B45 A1 23hr
> IDLE A1 24hr
>
> This leaves one idle CPU hour, so work fetch for an hour CPU time would
> occur and if from B would look like:
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> B25 B26 13hr
> B27 B28 14hr
> B29 B30 15hr
> B31 B32 16hr
> B33 B34 17hr
> B35 B36 18hr
> B37 B38 19hr
> B39 B40 20hr
> B41 B42 21hr
> B43 B44 22hr
> B45 B46 23hr
> A1 IDLE 24hr
>
> This leaves one idle CPU hour, so work fetch for an hour CPU time would
> occur and if from B would look like:
>
> CPU0 CPU1 End Time
> B1 B2 1hr
> B3 B4 2hr
> B5 B6 3hr
> B7 B8 4hr
> B9 B10 5hr
> B11 B12 6hr
> B13 B14 7hr
> B15 B16 8hr
> B17 B18 9hr
> B19 B20 10hr
> B21 B22 11hr
> B23 B24 12hr
> B25 B26 13hr
> B27 B28 14hr
> B29 B30 15hr
> B31 B32 16hr
> B33 B34 17hr
> B35 B36 18hr
> B37 B38 19hr
> B39 B40 20hr
> B41 B42 21hr
> B43 B44 22hr
> B45 B46 23hr
> B46 A1 24hr
>
> At this point, work fetch is complete, and there is no order of run that
> will idle a CPU. Yes, if A cannot provide extra work, that project is
> going to be starved for CPU time in this scenario. The reality is, of
> course, more complex, and starvation is only likely to occur if the
> deadlines are short compared to the disconnected duration on one project
> and the run times are extremely long on the other project and there are
no
> other projects to take up the slack. If there were a third project that
> was supplying work with deadlines of a week and run times of a few hours,
> that project would fill in the holes in the schedule and starvation would
> not occur. If Project A were supplying work, there would also not be
> starvation. So, while starvation is possible, it should be rare. IDLE
> CPUs have happened to me several times because of work under fetch.
>
> jm7
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.