I have been having difficulties recently with under fetching work.  The
numbers look OK, but the actual run order has been leaving one out of 2
CPUs dry with a half day to the next connection.  I have a simplified
example (note that a CPU is idle for 12 out or 24 hours in this scenario):

24 hours between connections.  Connect every X is 24 hours.
2 CPUs

At the beginning of the 24 hours:

Project A
Task A1:  200 hours of run time

Project B  24 hour deadlines
Task B1:  1 hour of run time remaining.
Task B2:  1 hour of run time remaining.
Task B3:  1 hour of run time remaining.
Task B4:  1 hour of run time remaining.
Task B5:  1 hour of run time remaining.
Task B6:  1 hour of run time remaining.
Task B7:  1 hour of run time remaining.
Task B8:  1 hour of run time remaining.
Task B9:  1 hour of run time remaining.
Task B10:  1 hour of run time remaining.
Task B11:  1 hour of run time remaining.
Task B12:  1 hour of run time remaining.
Task B13:  1 hour of run time remaining.
Task B14:  1 hour of run time remaining.
Task B15:  1 hour of run time remaining.
Task B16:  1 hour of run time remaining.
Task B17:  1 hour of run time remaining.
Task B18:  1 hour of run time remaining.
Task B19:  1 hour of run time remaining.
Task B20:  1 hour of run time remaining.
Task B21:  1 hour of run time remaining.
Task B22:  1 hour of run time remaining.
Task B23:  1 hour of run time remaining.
Task B24:  1 hour of run time remaining.

The actual run order will be:

Time 0:
Task B1 starts in EDF.
Task B2 starts in EDF.

Time 1 hr:
Task B1 completes.
Task B2 completes.
Task B3 starts in EDF.
Task B4 starts in EDF.

Time 2 hr.
Task B3 completes.
Task B4 completes.
Task B5 starts in EDF.
Task B6 starts in EDF.

Time 3 hr.
Task B5 completes.
Task B6 completes.
Task B7 starts in EDF.
Task B8 starts in EDF.

Time 4 hr.
Task B7 completes.
Task B8 completes.
Task B9 starts in EDF.
Task B10 starts in EDF.

Time 5 hr.
Task B9 completes.
Task B10 completes.
Task B11 starts in EDF.
Task B12 starts in EDF.

Time 6 hr.
Task B11 completes.
Task B12 completes.
Task B13 starts in EDF.
Task B14 starts in EDF.

Time 7 hr.
Task B13 completes.
Task B14 completes.
Task B15 starts in EDF.
Task B16 starts in EDF.

Time 8 hr.
Task B15 completes.
Task B16 completes.
Task B17 starts in EDF.
Task B18 starts in EDF.

Time 9 hr.
Task B17 completes.
Task B18 completes.
Task B19 start in EDF.
Task B20 starts in EDF.

Tima 10 hr.
Task B19 completes.
Task B20 completes.
Task B21 starts in EDF.
Task B22 starts in EDF.

Time 11 hr.
Task B21 completes.
Task B22 completes.
Task B23 starts in EDF.
Task B24 starts in EDF.

Time 12 hr.
Task B23 completes.
Task B24 completes.
Task A1 starts.
CPU 2 IDLE.

Time 13 hr.
Task A1 continues.
CPU 2 IDLE.

Time 14 hr.
Task A1 continues.
CPU 2 IDLE.

Time 15 hr.
Task A1 continues.
CPU 2 IDLE.

Time 15 hr.
Task A1 continues.
CPU 2 IDLE.

Time 16 hr.
Task A1 continues.
CPU 2 IDLE.

Time 17 hr.
Task A1 continues.
CPU 2 IDLE.

Time 18 hr.
Task A1 continues.
CPU 2 IDLE.

Time 19 hr.
Task A1 continues.
CPU 2 IDLE.

Time 20 hr.
Task A1 continues.
CPU 2 IDLE.

Time 21 hr.
Task A1 continues.
CPU 2 IDLE.

Time 22 hr.
Task A1 continues.
CPU 2 IDLE.

Time 23 hr.
Task A1 continues.

Time 24 hr.
More work is downloaded from some project.

Note that this can still happen with "extra work" set to greater than 1 day
as "extra work" is correctly ignored for projects that are overworked.

The correct method of calculating work fetch is to assume the worst
possible ordering of work run to meet the duration that the machine will
not be connected to the internet.  Unfortunately, packing a series of
knapsacks is NP complete, so a heuristic must be used instead of a search
for either the best or worst possible packing.  The worst packing that I
can find that is easy to program is by remaining runtime with the shortest
runtime remaining put into each bin (CPU).  So If the work fetch algorithm
had done this calculation, it would have requested more work.  Let us see
how that would work:

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
A1        IDLE      13hr
A1        IDLE      14hr
A1        IDLE      15hr
A1        IDLE      16hr
A1        IDLE      17hr
A1        IDLE      18hr
A1        IDLE      19hr
A1        IDLE      20hr
A1        IDLE      21hr
A1        IDLE      22hr
A1        IDLE      23hr
A1        IDLE      24hr

Work fetch would notice the potential for 12 hours of idle CPU time and
request this work from some project.  Since project B is already in EDF, it
should not be the first choice.  If A has some work available it would look
like (A tasks come as large chunks):

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
A1        A2        13hr
A1        A2        14hr
A1        A2        15hr
A1        A2        16hr
A1        A2        17hr
A1        A2        18hr
A1        A2        19hr
A1        A2        20hr
A1        A2        21hr
A1        A2        22hr
A1        A2        23hr
A1        A2        24hr

Work fetch would be complete.

However, if A is offline and work can only be fetched from B, we would have
an additional 12 hours of B tasks downloaded.

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
B25       B26       13hr
B27       B28       14hr
B29       B30       15hr
B31       B32       16hr
B33       B34       17hr
B35       B36       18hr
A1        IDLE      19hr
A1        IDLE      20hr
A1        IDLE      21hr
A1        IDLE      22hr
A1        IDLE      23hr
A1        IDLE      24hr

This still leaves 6 hours of idle CPU time, so more work needs to be
fetched from some where.  If it is from A, great.  If from B, it would look
like:

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
B25       B26       13hr
B27       B28       14hr
B29       B30       15hr
B31       B32       16hr
B33       B34       17hr
B35       B36       18hr
B37       B38       19hr
B39       B40       20hr
B41       B42       21hr
A1        IDLE      22hr
A1        IDLE      23hr
A1        IDLE      24hr

There are still 3 idle CPU hours, so work fetch would happen again, and if
from B would look like:

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
B25       B26       13hr
B27       B28       14hr
B29       B30       15hr
B31       B32       16hr
B33       B34       17hr
B35       B36       18hr
B37       B38       19hr
B39       B40       20hr
B41       B42       21hr
B43       B44       22hr
B45       A1        23hr
IDLE      A1        24hr

This leaves one idle CPU hour, so work fetch for an hour CPU time would
occur and if from B would look like:

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
B25       B26       13hr
B27       B28       14hr
B29       B30       15hr
B31       B32       16hr
B33       B34       17hr
B35       B36       18hr
B37       B38       19hr
B39       B40       20hr
B41       B42       21hr
B43       B44       22hr
B45       B46       23hr
A1        IDLE      24hr

This leaves one idle CPU hour, so work fetch for an hour CPU time would
occur and if from B would look like:

CPU0      CPU1      End Time
B1        B2        1hr
B3        B4        2hr
B5        B6        3hr
B7        B8        4hr
B9        B10       5hr
B11       B12       6hr
B13       B14       7hr
B15       B16       8hr
B17       B18       9hr
B19       B20       10hr
B21       B22       11hr
B23       B24       12hr
B25       B26       13hr
B27       B28       14hr
B29       B30       15hr
B31       B32       16hr
B33       B34       17hr
B35       B36       18hr
B37       B38       19hr
B39       B40       20hr
B41       B42       21hr
B43       B44       22hr
B45       B46       23hr
B46       A1        24hr

At this point, work fetch is complete, and there is no order of run that
will idle a CPU.  Yes, if A cannot provide extra work, that project is
going to be starved for CPU time in this scenario.  The reality is, of
course, more complex, and starvation is only likely to occur if the
deadlines are short compared to the disconnected duration on one project
and the run times are extremely long on the other project and there are no
other projects to take up the slack.  If there were a third project that
was supplying work with deadlines of a week and run times of a few hours,
that project would fill in the holes in the schedule and starvation would
not occur.  If Project A were supplying work, there would also not be
starvation.  So, while starvation is possible, it should be rare.  IDLE
CPUs have happened to me several times because of work under fetch.

jm7

_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Reply via email to