Andy Whitcroft wrote: > Con Kolivas wrote: >> On Thursday 22 March 2007 20:48, Andy Whitcroft wrote: >>> Andy Whitcroft wrote: >>>> Andy Whitcroft wrote: >>>>> Andrew Morton wrote: >>>>>> Temporarily at >>>>>> >>>>>> http://userweb.kernel.org/~akpm/2.6.21-rc4-mm1/ >>>>>> >>>>>> Will appear later at >>>>>> >>>>>> >>>>>> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc >>>>>> 4/2.6.21-rc4-mm1/ >>>>> [All of the below is from the pre hot-fix runs. The very few results >>>>> which are in for the hot-fix runs seem worse if anything. :( All >>>>> results should be out on TKO.] >>>>> >>>>>> - Restored the RSDL CPU scheduler (a new version thereof) >>>>> Unsure if the above is the culprit but there seems to be a smattering of >>>>> BUG's in kernbench from the schedular on several systems, and panics >>>>> which do not fully dump out. >>>>> >>>>> elm3b239 is about 2/4 kernbench being the test in progress when we >>>>> blammo in both failed tests, elm3b234 doesn't boot at all. >>>> Well I have one result through for backing RSDL out on elm3b239 and that >>>> does indeed seem to give us a successful boot and test. peterz has >>>> pointed me to an incremental patch from Con which I'll push through >>>> testing and see if that sorts it out. >>> Ok, tested the patch below on top of 2.6.21-rc4-mm1 and this seems to >>> fix the problem: >>> >>> http://ck.kolivas.org/patches/staircase-deadline/2.6.21-rc4-mm1-rsdl-0.32.p >>> atch >>> >>> Hard to tell from that patch whether it will be fixed in the changes >>> already committed to the next -mm. >>> >>> Its possible that it may be fixed by the following patch: >>> >>> sched-rsdl-improvements.patch >>> >>> Which has the following slipped in at the end of the changelog: >>> >>> A tiny change checking for MAX_PRIO in normal_prio() >>> may prevent oopses on bootup on large SMP due to >>> forking off the idle task. >>> >>> Con, are all the changes in the 0.32 patch above with akpm? >> Yes he's queued everything in that patch you tested for the next -mm. Thanks >> very much for testing it. > > No worries. I've just got through the results on the other machine in > the mix. That machine seems to be fixed by backing out RSDL and not by > the fixup 0.32 patch ... > > This second machine seems to had hard very soon after user space starts > executing but without a panic. I can't say that the symptoms are very > definitive, but I do have a good result from that machine without RSDL > and not with rsdl-0.32. > > The machine is a dual-core x86_64 machine: Dual Core AMD Opteron(tm) > Processor 275. > > I'll let you know if I find out anything else. Shout if you want any > information or have anything you want poked or tested.
Ok, I have yet a third x86_64 machine is is blowing up with the latest 2.6.21-rc4-mm1+hotfixes+rsdl-0.32 but working with 2.6.21-rc4-mm1+hotfixes-RSDL. I have results on various hotfix levels so I have just fired off a set of tests across the affected machines on that latest hotfix stack plus the RSDL backout and the results should be in in the next hour or two. I think there is a strong correlation between RSDL and these hangs. Any suggestions as to the next step. -apw - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/