On Tue, 2004-06-22 at 15:39, Keith Owens wrote:
> On Tue, 22 Jun 2004 14:54:43 -0700, 
> keith <[EMAIL PROTECTED]> wrote:
> >linux 2.6.7-bk4 
> >kdb-v4.4-2.6.7-common-1
> >kdb-v4.4-2.6.7-i386-1
> >
> >My issue is as follows...
> >  If I don't enable kdb everything works fine.  When I enable and boot
> >with KDB I fall into kdb when I bring the secondary cpus online.  KDB
> >claims it was entered via an NMI.  Without kdb there are no nmis being
> >registered by the system.  
> 
> This debug patch should get you up and running, and tell you where the
> nmi is coming from.  Untested.
hmmm,
  Ok I applied this patch and booted.  I will attach the bootlogs.  The
system is still not booted.  I don't have the magic decoder for the NMI
reasons (I have been down this patch before with out luck).
  In the boot log it says it entered kdb but there is no prompt.
Let me know what you think.  

Keith Mannthey


I did just see this
 "BUG: wrong zone alignment, it will crash"
and am now looking into it. 


-- Attached file included as plaintext by Ecartis --
-- File: ok.txt

nux version 2.6.7-bk4 ([EMAIL PROTECTED]) (gcc version 3.3.3 (SuSE Linux)) #7 SMP
Tue Jun 22 16:03:45 PDT 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009c400 (usable)
 BIOS-e820: 000000000009c400 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000eff91900 (usable)
 BIOS-e820: 00000000eff91900 - 00000000eff9c340 (ACPI data)
 BIOS-e820: 00000000eff9c340 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 00000003d0000000 (usable)
get_memcfg_from_srat: assigning address to rsdp
RSD PTR  v0 [IBM   ]
Begin SRAT table scan....
CPU 0x00 in proximity domain 0x00
CPU 0x02 in proximity domain 0x00
CPU 0x10 in proximity domain 0x00
CPU 0x12 in proximity domain 0x00
Memory range 0x0 to 0xF0000 (type 0x0) in proximity domain 0x00 enabled
Memory range 0x100000 to 0x390000 (type 0x0) in proximity domain 0x00 enabled
CPU 0x20 in proximity domain 0x01
CPU 0x22 in proximity domain 0x01
CPU 0x30 in proximity domain 0x01
CPU 0x32 in proximity domain 0x01
Memory range 0x390000 to 0x3D0000 (type 0x0) in proximity domain 0x01 enabled
acpi20_parse_srat: Entry length value is zero; can't parse any further!
pxm bitmap: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00
Number of logical nodes in system = 2
Number of memory chunks in system = 3
chunk 0 nid 0 start_pfn 00000000 end_pfn 000f0000
chunk 1 nid 0 start_pfn 00100000 end_pfn 00390000
chunk 2 nid 1 start_pfn 00390000 end_pfn 003d0000
Reserving 2560 pages of KVA for lmem_map of node 1
Shrinking node 1 from 3997696 pages to 3995136 pages
Reserving total of 2560 pages for numa KVA remap
14720MB HIGHMEM available.
886MB LOWMEM available.
min_low_pfn = 1318, max_low_pfn = 226816, highstart_pfn = 229376
Low memory ends at vaddr f7600000
node 0 will remap to vaddr f8000000 - f8000000
node 1 will remap to vaddr f7600000 - f8000000
High memory starts at vaddr f8000000
ACPI: S3 and PAE do not like each other for now, S3 disabled.
found SMP MP-table at 0009c540
On node 0 totalpages: 3670016
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 222720 pages, LIFO batch:16
  HighMem zone: 3443200 pages, LIFO batch:16
BUG: wrong zone alignment, it will crash
On node 1 totalpages: 259584
  DMA zone: 0 pages, LIFO batch:1
  Normal zone: 0 pages, LIFO batch:1
  HighMem zone: 259584 pages, LIFO batch:16
DMI 2.3 present.
IBM machine detected. Enabling interrupts during APM calls.
IBM machine detected. Disabling SMBus accesses.
Using APIC driver default
ACPI: RSDP (v000 IBM                                       ) @ 0x000fdfc0
ACPI: RSDT (v001 IBM    SERVIGIL 0x00001000 IBM  0x45444f43) @ 0xeff9c2c0
ACPI: FADT (v001 IBM    SERVIGIL 0x00001000 IBM  0x45444f43) @ 0xeff9c240
ACPI: MADT (v001 IBM    SERVIGIL 0x00001000 IBM  0x45444f43) @ 0xeff9c140
ACPI: SRAT (v001 IBM    SERVIGIL 0x00001000 IBM  0x45444f43) @ 0xeff9c000
ACPI: DSDT (v001 IBM    SERVIGIL 0x00001000 INTL 0x02002025) @ 0x00000000
ACPI: PM-Timer IO Port: 0x508
Switched to APIC driver `summit'.
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
Processor #0 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x02] enabled)
Processor #2 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x04] lapic_id[0x10] enabled)
Processor #16 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x05] lapic_id[0x12] enabled)
Processor #18 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x08] lapic_id[0x20] enabled)
Processor #32 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x09] lapic_id[0x22] enabled)
Processor #34 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x0c] lapic_id[0x30] enabled)
Processor #48 15:2 APIC version 20
ACPI: LAPIC (acpi_id[0x0d] lapic_id[0x32] enabled)
Processor #50 15:2 APIC version 20
ACPI: LAPIC_NMI (acpi_id[0x00] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x05] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x08] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x09] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0c] dfl dfl lint[0x1])
ACPI: LAPIC_NMI (acpi_id[0x0d] dfl dfl lint[0x1])
ACPI: IOAPIC (id[0x0e] address[0xfec00000] gsi_base[0])
IOAPIC[0]: Assigned apic_id 14
IOAPIC[0]: apic_id 14, version 17, address 0xfec00000, GSI 0-43
ACPI: IOAPIC (id[0x0d] address[0xfec01000] gsi_base[44])
IOAPIC[1]: Assigned apic_id 13
IOAPIC[1]: apic_id 13, version 17, address 0xfec01000, GSI 44-87
ACPI: INT_SRC_OVR (bus 0 bus_irq 8 global_irq 8 low edge)
ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 low level)
Enabling APIC mode:  Summit.  Using 2 I/O APICs
Using ACPI (MADT) for SMP configuration information
Built 2 zonelists
Kernel command line: root=/dev/sda2 resume=/dev/sda1 elevator=cfq showopts
console=ttyS0,115200 console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order 12: 32768 bytes)
Summit chipset: Starting Cyclone Counter.
Detected 1996.847 MHz processor.
Using cyclone for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Initializing highpages for node 0
Initializing highpages for node 1
Memory: 15592976k/15990784k available (2434k kernel code, 123524k reserved,
1203k data, 228k init, 14810692k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 193.53 BogoMIPS
kdb version 4.4 by Keith Owens, Scott Lurndal. Copyright SGI, All Rights
Reserved
kdb_cmd[0]: defcmd archkdb "" "First line arch debugging"
kdb_cmd[6]: defcmd archkdbcpu "" "archkdb with only tasks on cpus"
kdb_cmd[12]: defcmd archkdbshort "" "archkdb with less detailed backtrace"
kdb_cmd[18]: defcmd archkdbcommon "" "Common arch debugging"
Security Scaffold v1.0.0 initialized
SELinux:  Initializing.
SELinux:  Starting in permissive mode
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 2048K
CPU: Hyper-Threading is disabled
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU0: Intel P4/Xeon Extended MCE MSRs (12) available
CPU0: Thermal monitoring enabled
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
CPU0: Intel(R) XEON(TM) MP CPU 2.00GHz stepping 02
per-CPU timeslice cutoff: 1463.44 usecs.
task migration cache decay timeout: 2 msecs.
enabled ExtINT on CPU#0
Leaving ESR disabled.
Mapping cpu 0 to node 0
Booting processor 1/2 eip 2000
default_do_nmi entered, reason 37, cpu 0
unknown_nmi_error entered, reason 37, cpu 0
Initializing CPU#1
masked ExtINT on CPU#1
Leaving ESR disabled.
Mapping cpu 1 to node 0
Calibrating delay loop... 199.16 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 1024K
CPU: Hyper-Threading is disabled
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.
CPU1: Intel P4/Xeon Extended MCE MSRs (12) available
CPU1: Thermal monitoring enabled
Uhhuh. NMI received for unknown reason 25 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
CPU1: Intel(R) Xeon(TM) MP CPU 2.00GHz stepping 05
Booting processor 2/16 eip 2000
default_do_nmi entered, reason 53, cpu 0
unknown_nmi_error entered, reason 53, cpu 0
Initializing CPU#2
masked ExtINT on CPU#2
Leaving ESR disabled.
Mapping cpu 2 to node 0
Calibrating delay loop... 199.16 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 2048K
CPU: Hyper-Threading is disabled
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#2.
CPU2: Intel P4/Xeon Extended MCE MSRs (12) available
CPU2: Thermal monitoring enabled
Uhhuh. NMI received for unknown reason 35 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
CPU2: Intel(R) XEON(TM) MP CPU 2.00GHz stepping 02
Booting processor 3/18 eip 2000
default_do_nmi entered, reason 53, cpu 0
unknown_nmi_error entered, reason 53, cpu 0
Initializing CPU#3
masked ExtINT on CPU#3
Leaving ESR disabled.
Mapping cpu 3 to node 0
Calibrating delay loop... 199.16 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 1024K
CPU: Hyper-Threading is disabled
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#3.
CPU3: Intel P4/Xeon Extended MCE MSRs (12) available
CPU3: Thermal monitoring enabled
Uhhuh. NMI received for unknown reason 35 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
CPU3: Intel(R) Xeon(TM) MP CPU 2.00GHz stepping 05
Booting processor 4/32 eip 2000
default_do_nmi entered, reason 53, cpu 0
unknown_nmi_error entered, reason 53, cpu 0
Initializing CPU#4
masked ExtINT on CPU#4
Leaving ESR disabled.
Mapping cpu 4 to node 1
Calibrating delay loop... 199.16 BogoMIPS
CPU: Trace cache: 12K uops, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: L3 cache: 2048K
CPU: Hyper-Threading is disabled
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#4.
CPU4: Intel P4/Xeon Extended MCE MSRs (12) available
CPU4: Thermal monitoring enabled
Uhhuh. NMI received for unknown reason 35 on CPU 0.
Dazed and confused, but trying to continue
Do you have a strange power saving mode enabled?
CPU4: Intel(R) XEON(TM) MP CPU 2.00GHz stepping 02
Booting processor 5/34 eip 2000
default_do_nmi entered, reason 53, cpu 0
                                         



---------------------------
Use http://oss.sgi.com/ecartis to modify your settings or to unsubscribe.

Reply via email to