Re: [boinc_dev] [boinc_alpha] Boinc 7.0.7, avoiding overcommit with multithread job

David Anderson Thu, 05 Jan 2012 11:27:30 -0800

The current policy is:
If a multicore app is running, don't overcommit the CPUs
(i.e. don't schedule 4.1 threads on 4 cores).


This is because multicore apps may run inefficiently
if the CPUs are even slightly overcommitted
(at least, that was the case with AQUA).

We can reconsider this if there's evidence that the
above assumption doesn't hold in general.

-- David

On 04-Jan-2012 4:43 PM, Jorden van der Elst wrote:
> Hi all,
>
> I have this weird thing since just a bit ago on my system.
> System is an i3-530 (2 CPU + 2 HT) running BOINC 7.0.7 (for VBoxtest project).
> BOINC is allowed to use all 4 cores.
>
> When the CERNVM/VBoxwrapper Test Project work  is running, I notice
> that I have one Albert task running taking one CPU core, the CernVM
> task taking two CPU cores and that's it.
> Using<cpu_sched_debug>  I get the following messages:
>
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] Request CPU reschedule:
> periodic CPU scheduling
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] schedule_cpus(): start
> 05/01/2012 01:18:36 | SETI@home | [cpu_sched_debug] scheduling
> 25no11ak.32475.72.6.10.244_1 (coprocessor job, FIFO) (prio -0.642338)
> 05/01/2012 01:18:36 | SETI@home | [cpu_sched_debug] reserving 1.000000
> of coproc ATI
> 05/01/2012 01:18:36 | CERNVM/Vboxwrapper Test Project |
> [cpu_sched_debug] scheduling wu_1324932784_8_3 (CPU job, priority
> order) (prio -0.067619)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_1203_232_125939_0_1325700669_0 (CPU job, priority order)
> (prio -0.082803)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_929_416_618751_0_1325706198_0 (CPU job, priority order)
> (prio -0.083726)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_976_504_618750_0_1325706189_0 (CPU job, priority order)
> (prio -0.084649)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_1168_521_125938_0_1325700435_0 (CPU job, priority order)
> (prio -0.085572)
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] enforce_schedule(): start
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] preliminary job list:
> 05/01/2012 01:18:36 | SETI@home | [cpu_sched_debug] 0:
> 25no11ak.32475.72.6.10.244_1 (MD: no; UTS: yes)
> 05/01/2012 01:18:36 | CERNVM/Vboxwrapper Test Project |
> [cpu_sched_debug] 1: wu_1324932784_8_3 (MD: no; UTS: yes)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 2:
> wu_1203_232_125939_0_1325700669_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 3:
> wu_929_416_618751_0_1325706198_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 4:
> wu_976_504_618750_0_1325706189_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 5:
> wu_1168_521_125938_0_1325700435_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] final job list:
> 05/01/2012 01:18:36 | CERNVM/Vboxwrapper Test Project |
> [cpu_sched_debug] 0: wu_1324932784_8_3 (MD: no; UTS: yes)
> 05/01/2012 01:18:36 | SETI@home | [cpu_sched_debug] 1:
> 25no11ak.32475.72.6.10.244_1 (MD: no; UTS: yes)
> 05/01/2012 01:18:36 | Albert@Home | [cpu_sched_debug] 2:
> h1_0051.15_S6GC1__48_S6LV1A_2 (MD: no; UTS: yes)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 3:
> wu_1203_232_125939_0_1325700669_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 4:
> wu_929_416_618751_0_1325706198_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 5:
> wu_976_504_618750_0_1325706189_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] 6:
> wu_1168_521_125938_0_1325700435_0 (MD: no; UTS: no)
> 05/01/2012 01:18:36 | SETI@home | [coproc] ATI instance 0: confirming
> for 25no11ak.32475.72.6.10.244_1
> 05/01/2012 01:18:36 | CERNVM/Vboxwrapper Test Project |
> [cpu_sched_debug] scheduling wu_1324932784_8_3
> 05/01/2012 01:18:36 | SETI@home | [cpu_sched_debug] scheduling
> 25no11ak.32475.72.6.10.244_1
> 05/01/2012 01:18:36 | Albert@Home | [cpu_sched_debug] scheduling
> h1_0051.15_S6GC1__48_S6LV1A_2
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] avoiding
> overcommit with multithread job, skipping
> wu_1203_232_125939_0_1325700669_0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] avoiding
> overcommit with multithread job, skipping
> wu_929_416_618751_0_1325706198_0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] avoiding
> overcommit with multithread job, skipping
> wu_976_504_618750_0_1325706189_0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug] avoiding
> overcommit with multithread job, skipping
> wu_1168_521_125938_0_1325700435_0
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] using 3.05 out of 4 CPUs
> 05/01/2012 01:18:36 | Albert@Home | [cpu_sched_debug]
> h1_0051.15_S6GC1__48_S6LV1A_2 sched state 2 next 2 task state 1
> 05/01/2012 01:18:36 | Albert@Home | [cpu_sched_debug]
> h1_0051.15_S6GC1__45_S6LV1A_1 sched state 1 next 1 task state 0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> wu_1203_232_125939_0_1325700669_0 sched state 1 next 1 task state 0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> wu_929_416_618751_0_1325706198_0 sched state 1 next 1 task state 0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> wu_976_504_618750_0_1325706189_0 sched state 1 next 1 task state 0
> 05/01/2012 01:18:36 | malariacontrol.net | [cpu_sched_debug]
> wu_1168_521_125938_0_1325700435_0 sched state 1 next 1 task state 0
> 05/01/2012 01:18:36 | CERNVM/Vboxwrapper Test Project |
> [cpu_sched_debug] wu_1324932784_8_3 sched state 2 next 2 task state 1
> 05/01/2012 01:18:36 | SETI@home | [cpu_sched_debug]
> 25no11ak.32475.72.6.10.244_1 sched state 2 next 2 task state 1
> 05/01/2012 01:18:36 |  | [cpu_sched_debug] enforce_schedule: end
>
>
> Unless the CERNVM task is seen as a multithreading job, there are no
> other MT jobs running. There's a Seti task running on the GPU, yet it
> still leaves one CPU core (mostly) free.
> The malaria tasks aren't MT tasks either. They're just single
> threaded. I stopped the CERNVM task and 4 malaria tasks kicked in.
>
> 05/01/2012 01:25:54 |  | [cpu_sched_debug] Request CPU reschedule:
> periodic CPU scheduling
> 05/01/2012 01:25:54 |  | [cpu_sched_debug] schedule_cpus(): start
> 05/01/2012 01:25:54 | SETI@home | [cpu_sched_debug] scheduling
> 25no11ak.32475.72.6.10.244_1 (coprocessor job, FIFO) (prio -0.466793)
> 05/01/2012 01:25:54 | SETI@home | [cpu_sched_debug] reserving 1.000000
> of coproc ATI
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_1203_232_125939_0_1325700669_0 (CPU job, priority order)
> (prio -0.058869)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_929_416_618751_0_1325706198_0 (CPU job, priority order)
> (prio -0.059512)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_976_504_618750_0_1325706189_0 (CPU job, priority order)
> (prio -0.060156)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_1168_521_125938_0_1325700435_0 (CPU job, priority order)
> (prio -0.060799)
> 05/01/2012 01:25:54 |  | [cpu_sched_debug] enforce_schedule(): start
> 05/01/2012 01:25:54 |  | [cpu_sched_debug] preliminary job list:
> 05/01/2012 01:25:54 | SETI@home | [cpu_sched_debug] 0:
> 25no11ak.32475.72.6.10.244_1 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 1:
> wu_1203_232_125939_0_1325700669_0 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 2:
> wu_929_416_618751_0_1325706198_0 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 3:
> wu_976_504_618750_0_1325706189_0 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 4:
> wu_1168_521_125938_0_1325700435_0 (MD: no; UTS: no)
> 05/01/2012 01:25:54 |  | [cpu_sched_debug] final job list:
> 05/01/2012 01:25:54 | SETI@home | [cpu_sched_debug] 0:
> 25no11ak.32475.72.6.10.244_1 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 1:
> wu_1203_232_125939_0_1325700669_0 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 2:
> wu_929_416_618751_0_1325706198_0 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 3:
> wu_976_504_618750_0_1325706189_0 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | Albert@Home | [cpu_sched_debug] 4:
> h1_0051.15_S6GC1__48_S6LV1A_2 (MD: no; UTS: yes)
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] 5:
> wu_1168_521_125938_0_1325700435_0 (MD: no; UTS: no)
> 05/01/2012 01:25:54 | SETI@home | [coproc] ATI instance 0: confirming
> for 25no11ak.32475.72.6.10.244_1
> 05/01/2012 01:25:54 | SETI@home | [cpu_sched_debug] scheduling
> 25no11ak.32475.72.6.10.244_1
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_1203_232_125939_0_1325700669_0
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_929_416_618751_0_1325706198_0
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> scheduling wu_976_504_618750_0_1325706189_0
> 05/01/2012 01:25:54 | Albert@Home | [cpu_sched_debug] scheduling
> h1_0051.15_S6GC1__48_S6LV1A_2
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug] all CPUs
> used (4.05>= 4), skipping wu_1168_521_125938_0_1325700435_0
> 05/01/2012 01:25:54 | Albert@Home | [cpu_sched_debug]
> h1_0051.15_S6GC1__48_S6LV1A_2 sched state 2 next 2 task state 1
> 05/01/2012 01:25:54 | Albert@Home | [cpu_sched_debug]
> h1_0051.15_S6GC1__45_S6LV1A_1 sched state 1 next 1 task state 0
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> wu_1203_232_125939_0_1325700669_0 sched state 2 next 2 task state 1
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> wu_929_416_618751_0_1325706198_0 sched state 2 next 2 task state 1
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> wu_976_504_618750_0_1325706189_0 sched state 2 next 2 task state 1
> 05/01/2012 01:25:54 | malariacontrol.net | [cpu_sched_debug]
> wu_1168_521_125938_0_1325700435_0 sched state 1 next 1 task state 0
> 05/01/2012 01:25:54 | CERNVM/Vboxwrapper Test Project |
> [cpu_sched_debug] wu_1324932784_8_3 sched state 1 next 1 task state 0
> 05/01/2012 01:25:54 | SETI@home | [cpu_sched_debug]
> 25no11ak.32475.72.6.10.244_1 sched state 2 next 2 task state 1
> 05/01/2012 01:25:54 |  | [cpu_sched_debug] enforce_schedule: end
>
> At the time I write this, the CERNVm has restarted on its own and
> we're back in the situation where only the CERNVm and one Malaria take
> up CPU cores, and there's one mostly free (looking in Windows task
> manager, its System idle process is using between 21 and 23% cycles).
>
>
_______________________________________________
boinc_dev mailing list
[email protected]
http://lists.ssl.berkeley.edu/mailman/listinfo/boinc_dev
To unsubscribe, visit the above URL and
(near bottom of page) enter your email address.

Re: [boinc_dev] [boinc_alpha] Boinc 7.0.7, avoiding overcommit with multithread job

Reply via email to