Hi guys,

I sent this question to this list back in August, but it seems that the list was not really alive during that period, so I thought I'll just try again. So here it is again, I would really appreciate your comments, ideas about the following problem I have:

The problem occured when I was trying to start a kernel with kexec on a system with a Dual Core AMD Opteron processor (see /proc/cpuinfo attached at the end). The kernel I was using is the 2.6.16 kernel supplied with SLES10, compiled for 32 bit architecture, not configured to use SMP. I was using this for both (first and the second) kernels, they were pratically the same. The new (second or kexec'd) kernel just rebooted the processor at a point during the bootup procedure when unmasking the timer interrupt in the PIC during bootup (during time_init) without any error message whatsoever. I also tested the scenario using a newer kernel (2.6.21) compiled for 64 bits and configured to use SMP, but the problem still existed there... With some help and testing (starting the new kernel using grub, which worked fine, comparing interrupt environments for grub and kexec) I was able to find out that the problem was related to the i8259 hardware and how it is connected to the APIC. After a hardware reset the PIC is connected using Virtual Wire Mode directly to the local APIC and not using the IO APIC.
Debug info from the IO APIC at startup using grub:

.... IRQ redirection table:
NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
00 000 00  1    0    0   0   0    0    0    00
01 001 01  1    0    0   0   0    0    0    00
02 001 01  1    0    0   0   0    0    0    00
03 001 01  1    0    0   0   0    0    0    00
04 001 01  1    0    0   0   0    0    0    00
05 000 00  1    0    0   0   0    0    0    00
06 000 00  1    0    0   0   0    0    0    00
07 001 01  1    0    0   0   0    0    0    00
08 001 01  1    0    0   0   0    0    0    00
09 000 00  1    0    0   0   0    0    0    00
0a 001 01  1    0    0   0   0    0    0    00
0b 001 01  1    0    0   0   0    0    0    00
0c 001 01  1    0    0   0   0    0    0    00
0d 001 01  1    0    0   0   0    0    0    00
0e 001 01  1    0    0   0   0    0    0    00
0f 001 01  1    0    0   0   0    0    0    00

Debug info from the local APIC at start up using grub:

printing local APIC contents on CPU#0/0:
... APIC ID:      00000000 (0)
... APIC VERSION: 00040010
... APIC TASKPRI: 00000000 (00)
... APIC ARBPRI: 00000000 (00)
... APIC PROCPRI: 00000000
... APIC EOI: 00000000
... APIC RRR: 00000000
... APIC LDR: 00000000
... APIC DFR: ffffffff
... APIC SPIV: 0000010f
... APIC ISR field:
... APIC TMR field:
... APIC IRR field:
... APIC ESR: 00000004
... APIC ICR: 00004630
... APIC ICR2: 01000000
... APIC LVTT: 00010000
... APIC LVTPC: 00010000
... APIC LVT0: 00000700
... APIC LVT1: 00000400
... APIC LVTERR: 0001000f
... APIC TMICT: 00000000
... APIC TMCCT: 00000000
... APIC TDCR: 00000000

During booting up of the first kernel (in grub) I have the following Kernel Warning message:

"ExtINT not setup in hardware but reported by MP table"

Looking at the code in the enable_IO_APIC() function (arch/i386/kernel/io_apic.c), this message indicates that the system could not find the PIC routed through IO APIC looking at the IO APIC registers (so after this search, 'ioapic_i8259.pin' was '-1'), but it could find it in the MP tables searching for legacy IRQs. It then decided to override the 'ioapic_i8259.pin' value ('-1') according to the value found in the MP tables, which in this case was '0'. That's why later, before the shutdown of the first kernel, in the disable_IO_APIC() function (also in io_apic.c), the new 'ioapic_i8259.pin' value '0' triggered the enabling of Virtual Wire Mode connecting the PIC through the IO APIC (and not directly to the local APIC which was the case at normal boot up using grub), which in my case eventually meant that the new kernel just rebooted the processor when unmasking the timer interrupt in the PIC during bootup (during time_init).

An easy and fast way to quickly correct my problem is not to trust the MP table and just check the IO APIC IRQ redirection table to find out if the PIC is connected through the IO APIC in the enable_IO_APIC() function (in the first kernel).

But what is the reason for trusting the MP table and create this connection if it's not there from the beginning? Obviously, the system is not able to start up if the PIC is connected through the IO APIC by the APIC setup that is made in disable_IO_APIC() before the shutdown of the first kernel.

What would be a suggested nice solution for this problem?

Thanks in advance,

Tamas


-------------------------------------------------------------------------------------
> cat /proc/cpuinfo
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 15
model           : 33
model name      : Dual Core AMD Opteron(tm) Processor 165
stepping        : 2
cpu MHz         : 1800.374
cache size      : 1024 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm cmp_legacy ts fid vid ttp
bogomips        : 3606.40
-------------------------------------------------------------------------------------

_______________________________________________
fastboot mailing list
fastboot@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/fastboot

Reply via email to