I have been having difficulties recently with under fetching work. The numbers look OK, but the actual run order has been leaving one out of 2 CPUs dry with a half day to the next connection. I have a simplified example (note that a CPU is idle for 12 out or 24 hours in this scenario):
24 hours between connections. Connect every X is 24 hours. 2 CPUs At the beginning of the 24 hours: Project A Task A1: 200 hours of run time Project B 24 hour deadlines Task B1: 1 hour of run time remaining. Task B2: 1 hour of run time remaining. Task B3: 1 hour of run time remaining. Task B4: 1 hour of run time remaining. Task B5: 1 hour of run time remaining. Task B6: 1 hour of run time remaining. Task B7: 1 hour of run time remaining. Task B8: 1 hour of run time remaining. Task B9: 1 hour of run time remaining. Task B10: 1 hour of run time remaining. Task B11: 1 hour of run time remaining. Task B12: 1 hour of run time remaining. Task B13: 1 hour of run time remaining. Task B14: 1 hour of run time remaining. Task B15: 1 hour of run time remaining. Task B16: 1 hour of run time remaining. Task B17: 1 hour of run time remaining. Task B18: 1 hour of run time remaining. Task B19: 1 hour of run time remaining. Task B20: 1 hour of run time remaining. Task B21: 1 hour of run time remaining. Task B22: 1 hour of run time remaining. Task B23: 1 hour of run time remaining. Task B24: 1 hour of run time remaining. The actual run order will be: Time 0: Task B1 starts in EDF. Task B2 starts in EDF. Time 1 hr: Task B1 completes. Task B2 completes. Task B3 starts in EDF. Task B4 starts in EDF. Time 2 hr. Task B3 completes. Task B4 completes. Task B5 starts in EDF. Task B6 starts in EDF. Time 3 hr. Task B5 completes. Task B6 completes. Task B7 starts in EDF. Task B8 starts in EDF. Time 4 hr. Task B7 completes. Task B8 completes. Task B9 starts in EDF. Task B10 starts in EDF. Time 5 hr. Task B9 completes. Task B10 completes. Task B11 starts in EDF. Task B12 starts in EDF. Time 6 hr. Task B11 completes. Task B12 completes. Task B13 starts in EDF. Task B14 starts in EDF. Time 7 hr. Task B13 completes. Task B14 completes. Task B15 starts in EDF. Task B16 starts in EDF. Time 8 hr. Task B15 completes. Task B16 completes. Task B17 starts in EDF. Task B18 starts in EDF. Time 9 hr. Task B17 completes. Task B18 completes. Task B19 start in EDF. Task B20 starts in EDF. Tima 10 hr. Task B19 completes. Task B20 completes. Task B21 starts in EDF. Task B22 starts in EDF. Time 11 hr. Task B21 completes. Task B22 completes. Task B23 starts in EDF. Task B24 starts in EDF. Time 12 hr. Task B23 completes. Task B24 completes. Task A1 starts. CPU 2 IDLE. Time 13 hr. Task A1 continues. CPU 2 IDLE. Time 14 hr. Task A1 continues. CPU 2 IDLE. Time 15 hr. Task A1 continues. CPU 2 IDLE. Time 15 hr. Task A1 continues. CPU 2 IDLE. Time 16 hr. Task A1 continues. CPU 2 IDLE. Time 17 hr. Task A1 continues. CPU 2 IDLE. Time 18 hr. Task A1 continues. CPU 2 IDLE. Time 19 hr. Task A1 continues. CPU 2 IDLE. Time 20 hr. Task A1 continues. CPU 2 IDLE. Time 21 hr. Task A1 continues. CPU 2 IDLE. Time 22 hr. Task A1 continues. CPU 2 IDLE. Time 23 hr. Task A1 continues. Time 24 hr. More work is downloaded from some project. Note that this can still happen with "extra work" set to greater than 1 day as "extra work" is correctly ignored for projects that are overworked. The correct method of calculating work fetch is to assume the worst possible ordering of work run to meet the duration that the machine will not be connected to the internet. Unfortunately, packing a series of knapsacks is NP complete, so a heuristic must be used instead of a search for either the best or worst possible packing. The worst packing that I can find that is easy to program is by remaining runtime with the shortest runtime remaining put into each bin (CPU). So If the work fetch algorithm had done this calculation, it would have requested more work. Let us see how that would work: CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr A1 IDLE 13hr A1 IDLE 14hr A1 IDLE 15hr A1 IDLE 16hr A1 IDLE 17hr A1 IDLE 18hr A1 IDLE 19hr A1 IDLE 20hr A1 IDLE 21hr A1 IDLE 22hr A1 IDLE 23hr A1 IDLE 24hr Work fetch would notice the potential for 12 hours of idle CPU time and request this work from some project. Since project B is already in EDF, it should not be the first choice. If A has some work available it would look like (A tasks come as large chunks): CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr A1 A2 13hr A1 A2 14hr A1 A2 15hr A1 A2 16hr A1 A2 17hr A1 A2 18hr A1 A2 19hr A1 A2 20hr A1 A2 21hr A1 A2 22hr A1 A2 23hr A1 A2 24hr Work fetch would be complete. However, if A is offline and work can only be fetched from B, we would have an additional 12 hours of B tasks downloaded. CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr B25 B26 13hr B27 B28 14hr B29 B30 15hr B31 B32 16hr B33 B34 17hr B35 B36 18hr A1 IDLE 19hr A1 IDLE 20hr A1 IDLE 21hr A1 IDLE 22hr A1 IDLE 23hr A1 IDLE 24hr This still leaves 6 hours of idle CPU time, so more work needs to be fetched from some where. If it is from A, great. If from B, it would look like: CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr B25 B26 13hr B27 B28 14hr B29 B30 15hr B31 B32 16hr B33 B34 17hr B35 B36 18hr B37 B38 19hr B39 B40 20hr B41 B42 21hr A1 IDLE 22hr A1 IDLE 23hr A1 IDLE 24hr There are still 3 idle CPU hours, so work fetch would happen again, and if from B would look like: CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr B25 B26 13hr B27 B28 14hr B29 B30 15hr B31 B32 16hr B33 B34 17hr B35 B36 18hr B37 B38 19hr B39 B40 20hr B41 B42 21hr B43 B44 22hr B45 A1 23hr IDLE A1 24hr This leaves one idle CPU hour, so work fetch for an hour CPU time would occur and if from B would look like: CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr B25 B26 13hr B27 B28 14hr B29 B30 15hr B31 B32 16hr B33 B34 17hr B35 B36 18hr B37 B38 19hr B39 B40 20hr B41 B42 21hr B43 B44 22hr B45 B46 23hr A1 IDLE 24hr This leaves one idle CPU hour, so work fetch for an hour CPU time would occur and if from B would look like: CPU0 CPU1 End Time B1 B2 1hr B3 B4 2hr B5 B6 3hr B7 B8 4hr B9 B10 5hr B11 B12 6hr B13 B14 7hr B15 B16 8hr B17 B18 9hr B19 B20 10hr B21 B22 11hr B23 B24 12hr B25 B26 13hr B27 B28 14hr B29 B30 15hr B31 B32 16hr B33 B34 17hr B35 B36 18hr B37 B38 19hr B39 B40 20hr B41 B42 21hr B43 B44 22hr B45 B46 23hr B46 A1 24hr At this point, work fetch is complete, and there is no order of run that will idle a CPU. Yes, if A cannot provide extra work, that project is going to be starved for CPU time in this scenario. The reality is, of course, more complex, and starvation is only likely to occur if the deadlines are short compared to the disconnected duration on one project and the run times are extremely long on the other project and there are no other projects to take up the slack. If there were a third project that was supplying work with deadlines of a week and run times of a few hours, that project would fill in the holes in the schedule and starvation would not occur. If Project A were supplying work, there would also not be starvation. So, while starvation is possible, it should be rare. IDLE CPUs have happened to me several times because of work under fetch. jm7 _______________________________________________ boinc_dev mailing list [email protected] http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev To unsubscribe, visit the above URL and (near bottom of page) enter your email address.
