Currently we use RR simulation for 2 things:
1) to identify jobs that need to run EDF (for scheduling)
2) to compute device shortfall (for work fetch)

The problem here, seems to me, is that in some cases
(e.g., if lots of jobs run EDF)
RR sim is not the right thing to use for 2),
because the schedule it simulates may differ widely from the actual schedule.

A possible solution: to compute shortfall,
we first do an RR simulation to identify EDF jobs.
Then we do an "EDF/RR simulation" that models the way
the way the real scheduler works, i.e. it gives priority to EDF jobs.

I'll think about how to do this.
It's possible that an EDF/RR simulator can be implemented by
passing a flag to the existing RR simulator,
and adding a few lines of code to do EDF jobs first.

-- David

On 30-Sep-2010 7:06 AM, [email protected] wrote:
>
> I have been having difficulties recently with under fetching work.  The
> numbers look OK, but the actual run order has been leaving one out of 2
> CPUs dry with a half day to the next connection.  I have a simplified
> example (note that a CPU is idle for 12 out or 24 hours in this scenario):
>
> 24 hours between connections.  Connect every X is 24 hours.
> 2 CPUs
>
> At the beginning of the 24 hours:
>
> Project A
> Task A1:  200 hours of run time
>
> Project B  24 hour deadlines
> Task B1:  1 hour of run time remaining.
> Task B2:  1 hour of run time remaining.
> Task B3:  1 hour of run time remaining.
> Task B4:  1 hour of run time remaining.
> Task B5:  1 hour of run time remaining.
> Task B6:  1 hour of run time remaining.
> Task B7:  1 hour of run time remaining.
> Task B8:  1 hour of run time remaining.
> Task B9:  1 hour of run time remaining.
> Task B10:  1 hour of run time remaining.
> Task B11:  1 hour of run time remaining.
> Task B12:  1 hour of run time remaining.
> Task B13:  1 hour of run time remaining.
> Task B14:  1 hour of run time remaining.
> Task B15:  1 hour of run time remaining.
> Task B16:  1 hour of run time remaining.
> Task B17:  1 hour of run time remaining.
> Task B18:  1 hour of run time remaining.
> Task B19:  1 hour of run time remaining.
> Task B20:  1 hour of run time remaining.
> Task B21:  1 hour of run time remaining.
> Task B22:  1 hour of run time remaining.
> Task B23:  1 hour of run time remaining.
> Task B24:  1 hour of run time remaining.
>
> The actual run order will be:
>
> Time 0:
> Task B1 starts in EDF.
> Task B2 starts in EDF.
>
> Time 1 hr:
> Task B1 completes.
> Task B2 completes.
> Task B3 starts in EDF.
> Task B4 starts in EDF.
>
> Time 2 hr.
> Task B3 completes.
> Task B4 completes.
> Task B5 starts in EDF.
> Task B6 starts in EDF.
>
> Time 3 hr.
> Task B5 completes.
> Task B6 completes.
> Task B7 starts in EDF.
> Task B8 starts in EDF.
>
> Time 4 hr.
> Task B7 completes.
> Task B8 completes.
> Task B9 starts in EDF.
> Task B10 starts in EDF.
>
> Time 5 hr.
> Task B9 completes.
> Task B10 completes.
> Task B11 starts in EDF.
> Task B12 starts in EDF.
>
> Time 6 hr.
> Task B11 completes.
> Task B12 completes.
> Task B13 starts in EDF.
> Task B14 starts in EDF.
>
> Time 7 hr.
> Task B13 completes.
> Task B14 completes.
> Task B15 starts in EDF.
> Task B16 starts in EDF.
>
> Time 8 hr.
> Task B15 completes.
> Task B16 completes.
> Task B17 starts in EDF.
> Task B18 starts in EDF.
>
> Time 9 hr.
> Task B17 completes.
> Task B18 completes.
> Task B19 start in EDF.
> Task B20 starts in EDF.
>
> Tima 10 hr.
> Task B19 completes.
> Task B20 completes.
> Task B21 starts in EDF.
> Task B22 starts in EDF.
>
> Time 11 hr.
> Task B21 completes.
> Task B22 completes.
> Task B23 starts in EDF.
> Task B24 starts in EDF.
>
> Time 12 hr.
> Task B23 completes.
> Task B24 completes.
> Task A1 starts.
> CPU 2 IDLE.
>
> Time 13 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 14 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 15 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 15 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 16 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 17 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 18 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 19 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 20 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 21 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 22 hr.
> Task A1 continues.
> CPU 2 IDLE.
>
> Time 23 hr.
> Task A1 continues.
>
> Time 24 hr.
> More work is downloaded from some project.
>
> Note that this can still happen with "extra work" set to greater than 1 day
> as "extra work" is correctly ignored for projects that are overworked.
>
> The correct method of calculating work fetch is to assume the worst
> possible ordering of work run to meet the duration that the machine will
> not be connected to the internet.  Unfortunately, packing a series of
> knapsacks is NP complete, so a heuristic must be used instead of a search
> for either the best or worst possible packing.  The worst packing that I
> can find that is easy to program is by remaining runtime with the shortest
> runtime remaining put into each bin (CPU).  So If the work fetch algorithm
> had done this calculation, it would have requested more work.  Let us see
> how that would work:
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> A1        IDLE      13hr
> A1        IDLE      14hr
> A1        IDLE      15hr
> A1        IDLE      16hr
> A1        IDLE      17hr
> A1        IDLE      18hr
> A1        IDLE      19hr
> A1        IDLE      20hr
> A1        IDLE      21hr
> A1        IDLE      22hr
> A1        IDLE      23hr
> A1        IDLE      24hr
>
> Work fetch would notice the potential for 12 hours of idle CPU time and
> request this work from some project.  Since project B is already in EDF, it
> should not be the first choice.  If A has some work available it would look
> like (A tasks come as large chunks):
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> A1        A2        13hr
> A1        A2        14hr
> A1        A2        15hr
> A1        A2        16hr
> A1        A2        17hr
> A1        A2        18hr
> A1        A2        19hr
> A1        A2        20hr
> A1        A2        21hr
> A1        A2        22hr
> A1        A2        23hr
> A1        A2        24hr
>
> Work fetch would be complete.
>
> However, if A is offline and work can only be fetched from B, we would have
> an additional 12 hours of B tasks downloaded.
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> B25       B26       13hr
> B27       B28       14hr
> B29       B30       15hr
> B31       B32       16hr
> B33       B34       17hr
> B35       B36       18hr
> A1        IDLE      19hr
> A1        IDLE      20hr
> A1        IDLE      21hr
> A1        IDLE      22hr
> A1        IDLE      23hr
> A1        IDLE      24hr
>
> This still leaves 6 hours of idle CPU time, so more work needs to be
> fetched from some where.  If it is from A, great.  If from B, it would look
> like:
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> B25       B26       13hr
> B27       B28       14hr
> B29       B30       15hr
> B31       B32       16hr
> B33       B34       17hr
> B35       B36       18hr
> B37       B38       19hr
> B39       B40       20hr
> B41       B42       21hr
> A1        IDLE      22hr
> A1        IDLE      23hr
> A1        IDLE      24hr
>
> There are still 3 idle CPU hours, so work fetch would happen again, and if
> from B would look like:
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> B25       B26       13hr
> B27       B28       14hr
> B29       B30       15hr
> B31       B32       16hr
> B33       B34       17hr
> B35       B36       18hr
> B37       B38       19hr
> B39       B40       20hr
> B41       B42       21hr
> B43       B44       22hr
> B45       A1        23hr
> IDLE      A1        24hr
>
> This leaves one idle CPU hour, so work fetch for an hour CPU time would
> occur and if from B would look like:
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> B25       B26       13hr
> B27       B28       14hr
> B29       B30       15hr
> B31       B32       16hr
> B33       B34       17hr
> B35       B36       18hr
> B37       B38       19hr
> B39       B40       20hr
> B41       B42       21hr
> B43       B44       22hr
> B45       B46       23hr
> A1        IDLE      24hr
>
> This leaves one idle CPU hour, so work fetch for an hour CPU time would
> occur and if from B would look like:
>
> CPU0      CPU1      End Time
> B1        B2        1hr
> B3        B4        2hr
> B5        B6        3hr
> B7        B8        4hr
> B9        B10       5hr
> B11       B12       6hr
> B13       B14       7hr
> B15       B16       8hr
> B17       B18       9hr
> B19       B20       10hr
> B21       B22       11hr
> B23       B24       12hr
> B25       B26       13hr
> B27       B28       14hr
> B29       B30       15hr
> B31       B32       16hr
> B33       B34       17hr
> B35       B36       18hr
> B37       B38       19hr
> B39       B40       20hr
> B41       B42       21hr
> B43       B44       22hr
> B45       B46       23hr
> B46       A1        24hr
>
> At this point, work fetch is complete, and there is no order of run that
> will idle a CPU.  Yes, if A cannot provide extra work, that project is
> going to be starved for CPU time in this scenario.  The reality is, of
> course, more complex, and starvation is only likely to occur if the
> deadlines are short compared to the disconnected duration on one project
> and the run times are extremely long on the other project and there are no
> other projects to take up the slack.  If there were a third project that
> was supplying work with deadlines of a week and run times of a few hours,
> that project would fill in the holes in the schedule and starvation would
> not occur.  If Project A were supplying work, there would also not be
> starvation.  So, while starvation is possible, it should be rare.  IDLE
> CPUs have happened to me several times because of work under fetch.
>
> jm7
>
> _______________________________________________
> boinc_dev mailing list
> [email protected]
> http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
> To unsubscribe, visit the above URL and
> (near bottom of page) enter your email address.
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to