On Fri, Nov 10, 2006 at 09:33:47AM -0500, Ben Romer wrote:
> On Thu, 2006-11-09 at 17:06 -0500, Vivek Goyal wrote:
> > On Thu, Nov 09, 2006 at 03:39:17PM -0500, Ben Romer wrote:
> > > For the last several weeks I've been trying to get to the root of a
> > > kexec clock problem we've been seeing on our ES7000/ONE systems. I've
> > > been down various routes, verifying that the system is going into
> > > virtual wire mode correctly and trying to turn the PM clock back on
> > > before rebooting, but I have been unable to get a kexec dump without
> > > passing in a loops-per-jiffy value on the kexec command line. The clock
> > > seems to re-activate once the APIC is initialized, but before that I get
> > > nothing at all. Without the lpj= value, the kexec'd kernel hangs while
> > > trying to compute lpj.
> > > 
> > 
> > So timer interrupts are not coming in second kernel hence jiffies
> > don't get updated and you hang in calibrate_delay_loop()?
> > 
> 
> Yep, that's exactly the problem. I've checked the APIC routing and the
> virtual wire mode code puts us back into the exact same state that we
> had when we booted, but the clock doesn't work. 
> 
> > Is it anyway related to boot cpu? If you boot your first kernel with only
> > one processor, do you still see the issue? Does kexec work on this machine? 
> > If kexec works then probably its more of a software setting issue.
> > 
> > Does it work if second kernel is passed with command line option
> > "nolapic" ?
> > 
> 
> I'll try both of these suggestions right away. :)
> 
> > Sorry I am not very well versed with timers, hence a stupid question.
> > What is a PM timer? Can it driver the timer interrupt like an 8253/8254
> > chipset or an HPET can do? If yes, how the interrupt is routed to CPU
> > on your mahine?
> > 
> 
> I'm not all that well versed with timers either, which I suspect is why
> I'm having trouble. ;) In the code I've read the kernel treats the PM
> timer identically to how a PIT timer works, so I believe they're
> basically the same thing. What I saw was that the kernel switches over
> to the APIC timer later in the boot process, and it shuts off the PM
> timer. So, I attempted to re-enable the PM timer inside of
> machine_kexec() but that didn't work. With lpj set the system makes it
> past calibrate_delay_loop() and the clock comes back on when we get to
> the IOAPIC initialization. 
> 

The sheer fact that ticks start coming after IOAPIC initialization, kind
of points towards that it is a software setting issue.

> > Any idea in your system, initially how does BIOS setup the LAPIC/IOAPIC
> > to deliver the timer interrupt to the CPU? It is done directly through
> > LAPIC or routed through IOAPIC?
> > 
> 
> I'm pretty sure when we come up we're in virtual wire mode B, and the
> timer interrupt is routed through the IOAPIC.
> 

So most likely timer is behind 8259 and 8259 output is connected to 
either pin0 or pin2 of IOAPIC which will be setup as ExtInt pin machine
is going down. It will deliver the interrupt to LAPIC in ExtInt mode 
which will generate INTA cycles and 8259 will provide the interrupt
vector info.

Looks like some strange locking issue. I had fixed one interrupt locking
issue in the past, though this one looks like a different problem.

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=da7ed9f98f6f3f18664f8ab24303f9428b9d78f8

Can you also print LAPIC and IOAPIC states and post it. I think
LAPIC/IOAPIC states at following four points will be interesting.

- During early boot of first kernel to find out how BIOS had set
  LAPIC and IOAPIC.

- During early boot of Second kernel to find out how kexec had setup
  the states and what are the pending interrupts.

- After the IOPAIC initialization in second kernel. To find out what
  had changed which enabled to interrupts to resume.  

Thanks
Vivek
_______________________________________________
fastboot mailing list
[email protected]
https://lists.osdl.org/mailman/listinfo/fastboot

Reply via email to