The scheduler problems in lenny's kernel (as evidenced by "task * blocked for more than 120 seconds") seem to be hitting a number of people and a variety of workloads:
#516374: INFO: task * blocked for more than 120 seconds. (ubuntu bug #276476) #517449: linux-image-2.6.26-1-amd64: SCHED_IDLE issues (tasks blocked for more than 120 seconds) #517586: "INFO: task * blocked for more than 120 seconds" causes system freeze #499745: linux-image-2.6.26-1-xen-686: freezes under Xen 3.2.0 Until now, I've experienced this primarily on machines running several KVM VMs, but have noticed it in other cases now that I've been looking for it. For example, on a 2x2.4GHz Xeon machine with 2GB of RAM running a moderately loaded OpenLDAP slapd (very little disk I/O, ~65% of its memory and 40-50% CPU used): [386715.749526] INFO: task cron:1070 blocked for more than 120 seconds. [386715.749579] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [386715.749628] cron D 00000000 0 1070 3130 [386715.749635] f7ed2140 00200086 00000000 00000000 c0354e40 f7ed22cc c2011fa0 00000000 [386715.749648] 00000003 041b8f25 00000000 c011cb0f 00000000 00000000 00000000 000000ff [386715.749658] 7fffffff 7fffffff c3c63f68 00000002 c02b8519 f762b340 ffffffff f7566688 [386715.749670] Call Trace: [386715.749703] [<c011cb0f>] sched_balance_self+0x1ce/0x227 [386715.749726] [<c02b8519>] schedule_timeout+0x13/0x86 [386715.749749] [<c02b7c3d>] wait_for_common+0xaf/0x10f [386715.749759] [<c011b682>] default_wake_function+0x0/0x8 [386715.749774] [<c0121b89>] do_fork+0x17f/0x1dc [386715.749792] [<c0102173>] sys_vfork+0x18/0x1c [386715.749801] [<c01038ce>] syscall_call+0x7/0xb I'm currently running 2.6.28 (from sid as of ~4 weeks ago) with the three patches mentioned in LP#276476, which has taken our heavily loaded KVM hosts from locking up every 3-6 days to completely stable. I looked at backporting the patches in question to lenny's 2.6.26, but they don't apply cleanly and I don't know enough about the Linux scheduler to be confident in doing it myself. Given that this bug seems to be affecting a number of people in substantial ways, could these changes be backported to 2.6.26, perhaps with an upload to proposed-updates? Even if a lenny update to fix this problem isn't in the cards, would someone with more kernel knowledge be willing to help me fix 2.6.26? I'm willing to provide any testing or other assistance; I just don't have the specialized knowledge to make this fix in 2.6.26. john -- John Morrissey _o /\ ---- __o [email protected] _-< \_ / \ ---- < \, www.horde.net/ __(_)/_(_)________/ \_______(_) /_(_)__ -- To UNSUBSCRIBE, email to [email protected] with a subject of "unsubscribe". Trouble? Contact [email protected]

