Hi I have tried 2 things
1) stop the hal daemon from running - but this hasn't helped 8( still crashing and dumping 2) I have removed cpu #2 with set cpu_enabled b - this seems to have the best affect, in that the box is more stable, linux only see's 3 cpus and not four. I am guessing that the problem is not with the CPU barfing, looking at the code in smp_call_function_on_cpu smp.c it looks like it is trying to talk to the other cpu's and one of them is failing. And timing out which is causing this problem - hence why I can remove cpu2 from within srm and it is stable. This brings me to my original question why isn't isolcpus working when I boot with isolcpus=2 I thought it isolated cpu 2 from the schedular and thus removed any chance of it running any tasks, threads etc.... Is this as good as removing it from within srm, or is there a chance that int's might still run on there. I checked this with tasksel and ran it on the current pids all had masks of f - it seems like the srm environment over rides the isolcpus option alex > -----Original Message----- > From: Estabrook, Jay > Sent: Thursday, 28 July 2005 12:44 AM > To: Samad, Alex > Cc: Linux on Alpha processors; [email protected] > Subject: Re: problem with smp and isolcpus > > On Wed, Jul 27, 2005 at 12:26:26PM +1000, Samad, Alex wrote: > > > > Seem to have a problem with one of my cpus on a ES45, cpu2 seems to be > > dying, I have had 3 lockups in 2 days > > > > Jul 26 12:26:23 keyzervega kernel: smp_call_function_on_cpu: initial > > timeout -- trying long wait > > Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock stuck > > in nifd at fffffc00012c65f0(3) owner hald-addon-stor at fffffc00012c65f > > 0(0) lib/kernel_lock.c:229 > > Jul 26 12:26:53 keyzervega kernel: lib/kernel_lock.c:229 spinlock stuck > > in automount at fffffc00012c65f0(1) owner hald-addon-stor at fffffc0001 > > 2c65f0(0) lib/kernel_lock.c:229 > > Jul 26 12:26:53 keyzervega kernel: Kernel bug at > > arch/alpha/kernel/smp.c:858 > > Jul 26 12:26:53 keyzervega kernel: CPU 0 hald-addon-stor(1801): Kernel > > Bug 1 > > From the above messages, it'd be more likely that CPU #0 was bad, because > that was where the lock was being held for too long. > > However, what is more likely, is that the HAL daemon has crashed. > > I've seen a number of machine checks due to HAL daemon startup, and > recommend thet it NOT BE STARTED. > > > Is this a know issue is the a resolve, if not where can I log a bug? > > Where is bug tracking for it ? > > The problems with the HAL daemon are known issues on Alpha. > > --Jay++ > > --------------------------------------------------------------- > Jay A Estabrook HPTC - XC I & B > Hewlett-Packard Company - ZKO1-3/D-B.8 (603) 884-0301 > 110 Spit Brook Road, Nashua NH 03062 [EMAIL PROTECTED] > ---------------------------------------------------------------

