I've been struggling with a kernel hang during bootup + enumeration of a Rapid IO system.
My current system contains a N.A.T MCH (using the IDT/Tundra Tsi 578 switch) and a Vadatech AMC719 card using the Freescale P4080 processor. There will be other cards added to the system, but I'm testing with just this for now. I'm using a Linux kernel version 2.6.34.6. I've set riohdid=0 on the kernel command line, and I'm expecting Linux to fully enumerate and configure the Rapid IO fabric. (This may be a bad assumption on my part.) After lots of tracing, I've determined that the kernel is hanging on the first maintenance transaction to the switch. The hang will often be followed by a "machine check in kernel mode" exception and panic. This is very similar to the behavior reported in this mailing list thread from 2010: http://lists.ozlabs.org/pipermail/linuxppc-dev/2010-October/086235.html I've read that thread several times and tries most of the suggestions, but they don't appear to apply in my hardware configuration.linu Is it possible that something in the switch isn't completely initialized at the time that Linux tries to do the maintenance transaction? If so, how do I find it? Here's the console log for a bootup using the supplied kernel: Freescale XGMAC MDIO Bus: probed Setting up RapidIO peer-to-peer network /rapidio@ffe0c0000 fsl-of-rio ffe0c0000.rapidio: Of-device full name /rapidio@ffe0c0000 fsl-of-rio ffe0c0000.rapidio: Regs: [mem 0xffe0c0000-0xffe0dffff] fsl-of-rio ffe0c0000.rapidio: LAW start 0x0000000c20000000, size 0x0000000001000000. fsl-of-rio ffe0c0000.rapidio: errirq: 16, bellirq: 57, txirq: 60, rxirq 61 fsl-of-rio ffe0c0000.rapidio: RapidIO PHY type: serial SRIO Port 1 Status: Lane0Sync Lane1Sync Lane2Sync Lane3Sync Aligned SRIO Port 2 Status: (Note: Freescale driver only supports Port 1) fsl-of-rio ffe0c0000.rapidio: Hardware port width: 4 fsl-of-rio ffe0c0000.rapidio: Training connection status: Four-lane fsl-of-rio ffe0c0000.rapidio: RapidIO Common Transport System size: 256 RIO: enumerate master port 0, RIO0 mport Machine check in kernel mode. RIO: port1 error Caused by (from MCSR=a000): Load Error Report Guarded Load Error Report Oops: Machine check, sig: 7 [#1] SMP NR_CPUS=8 amc718_based last sysfs file: Modules linked in: NIP: c001a460 LR: c01ee41c CTR: c001a420 REGS: effc9f10 TRAP: 0204 Not tainted (2.6.34.6-vt3-svn36835) MSR: 00021002 <ME,CE> CR: 24022024 XER: 00000000 TASK = ebc68000[1] 'swapper' THREAD: ebc62000 CPU: 6 GPR00: f1200000 ebc63d10 ebc68000 00000000 00000000 3fc00000 3fc00000 f1200068 GPR08: 00000004 ebc63d18 f1190c20 eb530000 24022022 d811c00a 00000000 00000000 GPR16: 00000000 7ffe2a00 00000000 00000000 7fff0df0 00000000 00000000 00000000 GPR24: 00000081 000000ff 00000000 ebd89400 00000068 00029002 c05e8914 ebc63d58 NIP [c001a460] fsl_rio_config_read+0x40/0x78 LR [c01ee41c] rio_mport_read_config_32+0x7c/0xac Call Trace: [ebc63d50] [c01eed64] rio_get_host_deviceid_lock+0x3c/0x50 [ebc63d70] [c045acd4] rio_enum_peer+0x28/0x3e4 [ebc63dd0] [c045b178] rio_enum_mport+0xe8/0x244 [ebc63e10] [c045a59c] rio_init_mports+0x90/0xe4 [ebc63e30] [c0457a5c] fsl_of_rio_rpn_probe+0x3c/0x50 [ebc63e40] [c034abe4] of_platform_device_probe+0x58/0x98 [ebc63e60] [c02274d8] driver_probe_device+0xa4/0x1b4 [ebc63e80] [c02260cc] bus_for_each_drv+0x6c/0xa8 [ebc63eb0] [c022735c] device_attach+0xa4/0xc8 [ebc63ed0] [c0226afc] bus_probe_device+0x2c/0x44 [ebc63ee0] [c02245f8] device_add+0x460/0x5a8 [ebc63f30] [c034a750] of_device_register+0x34/0x48 [ebc63f40] [c0008d64] of_platform_device_create+0x44/0x74 [ebc63f50] [c0008f90] of_platform_bus_probe+0x130/0x15c [ebc63f70] [c0565480] declare_of_platform_devices+0x24/0x140 [ebc63f90] [c05651cc] __machine_initcall_amc718_based_declare_of_platform_devices+0x2c/0x3c [ebc63fa0] [c0001cb8] do_one_initcall+0x3c/0x1d0 [ebc63fd0] [c055e9b0] kernel_init+0x190/0x230 [ebc63ff0] [c000f284] kernel_thread+0x4c/0x68 Instruction dump: 814b000c 54e0ba7e 7cc60378 7c0004ac 90ca0000 2f880001 800b0018 7ce03a14 419e0020 2f880002 419e002c 38600000 <80e70000> 7c2006ac 90e90000 4e800020 ---[ end trace 561bb236c800851f ]--- Kernel panic - not syncing: Attempted to kill init! Call Trace: Rebooting in 180 seconds.. Here's a partial log with some additional output and a dump of the error registers at the time of failure: fsl-elo-dma ffe101300.dma: request channel 0 IRQ fsl-elo-dma ffe101300.dma: request channel 1 IRQ fsl-elo-dma ffe101300.dma: request channel 2 IRQ fsl-elo-dma ffe101300.dma: request channel 3 IRQ Freescale PowerQUICC MII Bus: probed Freescale XGMAC MDIO Bus: probed fsl-of-rio ffe0c0000.rapidio: Setting up RapidIO peer-to-peer network /rapidio@ffe0c0000 fsl-of-rio ffe0c0000.rapidio: Of-device full name /rapidio@ffe0c0000 fsl-of-rio ffe0c0000.rapidio: Regs: [mem 0xffe0c0000-0xffe0dffff] fsl-of-rio ffe0c0000.rapidio: LAW start 0x0000000c20000000, size 0x0000000001000000 fsl-of-rio ffe0c0000.rapidio: get_immrbase() ffe000000 fsl-of-rio ffe0c0000.rapidio: IO c20000000 c20ffffff alloc irq_desc for 57 on node 0 alloc kstat_irqs on node 0 irq: irq 57 on host /soc@ffe000000/pic@40000 mapped to virtual irq 57 alloc irq_desc for 60 on node 0 alloc kstat_irqs on node 0 irq: irq 60 on host /soc@ffe000000/pic@40000 mapped to virtual irq 60 alloc irq_desc for 61 on node 0 alloc kstat_irqs on node 0 irq: irq 61 on host /soc@ffe000000/pic@40000 mapped to virtual irq 61 fsl-of-rio ffe0c0000.rapidio: errirq: 16, bellirq: 57, txirq: 60, rxirq 61 fsl-of-rio ffe0c0000.rapidio: Host deviceid 0 fsl-of-rio ffe0c0000.rapidio: RapidIO PHY type: serial fsl-of-rio ffe0c0000.rapidio: SRIO Port 1 Status: Lane0Sync Lane1Sync Lane2Sync Lane3Sync Aligned fsl-of-rio ffe0c0000.rapidio: SRIO Port 2 Status: (Note: Freescale driver only supports Port 1) fsl-of-rio ffe0c0000.rapidio: Hardware port width: 4 fsl-of-rio ffe0c0000.rapidio: Training connection status: Four-lane fsl-of-rio ffe0c0000.rapidio: RapidIO Common Transport System size: 256 RIO: enumerate master port 0, RIO0 mport fsl_local_config_write: index 0 offset 00000068 data 00000000 fsl_local_config_read: index 0 offset 00000068 (ebc63da8) = 00000000 fsl_local_config_write: index 0 offset 00000060 data 00000000 fsl_local_config_read: index 0 offset 0000013c (ebc63da8) = e0000000 RIO0 mport PGCCSR e0000000 fsl_local_config_read: index 0 offset 0000000c (ebc63d58) = 00000100 fsl_local_config_read: index 0 offset 00000100 (ebc63d58) = 06000001 fsl_local_config_read: index 0 offset 00000158 (ebc63d88) = 00020302 fsl_local_config_read: index 0 offset 0000013c (ebc63da8) = e0000000 RIO0 mport is active PGCCSR e0000000 rio_enum_peer 1Machine check in kernel mode. RIO: port1 error P1 error regs EDCSR 00000005 IECSR 00000000 ESCSR 00020302 LTLEDCSR 00000000 Caused by (from MCSR=a000): Load Error Report Guarded Load Error Report Oops: Machine check, sig: 7 [#1] SMP NR_CPUS=8 amc718_based last sysfs file: Modules linked in: NIP: c001a838 LR: c01f201c CTR: c001a748 REGS: effc9f10 TRAP: 0204 Not tainted (2.6.34.6-MCP-svn1717) MSR: 00021002 <ME,CE> CR: 24022022 XER: 00000000 TASK = ebc68000[1] 'swapper' THREAD: ebc62000 CPU: 6 GPR00: 00000068 ebc63cf0 ebc68000 ffffffea 00000000 000000ff 00000000 00000068 GPR08: 00000004 ebd80000 3fc00000 f1190c20 24022022 d814c00a 00000000 00000000 GPR16: 00000000 7ffe2a00 00000000 00000000 7fff0df0 00000000 00000000 00000000 GPR24: 00000081 000000ff f1200068 00000000 ebc63d18 00000000 000000ff 00000068 NIP [c001a838] fsl_rio_config_read+0xf0/0x11c LR [c01f201c] rio_mport_read_config_32+0x7c/0xac Call Trace: [ebc63cf0] [7ffe2a00] 0x7ffe2a00 (unreliable) [ebc63d10] [c01f201c] rio_mport_read_config_32+0x7c/0xac [ebc63d50] [c01f28d0] rio_get_host_deviceid_lock+0x3c/0x60 [ebc63d70] [c045ec8c] rio_enum_peer+0x34/0x4c0 [ebc63dd0] [c045f228] rio_enum_mport+0x110/0x290 [ebc63e10] [c045e484] rio_init_mports+0x90/0xe4 [ebc63e30] [c045b944] fsl_of_rio_rpn_probe+0x4c/0x60 [ebc63e40] [c034ea48] of_platform_device_probe+0x58/0x98 [ebc63e60] [c022b334] driver_probe_device+0xa4/0x1b4 [ebc63e80] [c0229f28] bus_for_each_drv+0x6c/0xa8 [ebc63eb0] [c022b1b8] device_attach+0xa4/0xc8 [ebc63ed0] [c022a958] bus_probe_device+0x2c/0x44 [ebc63ee0] [c0228454] device_add+0x460/0x5a8 [ebc63f30] [c034e5b4] of_device_register+0x34/0x48 [ebc63f40] [c0008d64] of_platform_device_create+0x44/0x74 [ebc63f50] [c0008f90] of_platform_bus_probe+0x130/0x15c [ebc63f70] [c056b534] declare_of_platform_devices+0x24/0x140 [ebc63f90] [c056b280] __machine_initcall_amc718_based_declare_of_platform_devices+0x2c/0x3c [ebc63fa0] [c0001cb8] do_one_initcall+0x3c/0x1d0 [ebc63fd0] [c05649b0] kernel_init+0x190/0x230 [ebc63ff0] [c000f284] kernel_thread+0x4c/0x68 Instruction dump: 7fa6eb78 7fe7fb78 7f49d378 4843d975 2f9b0000 409e0028 935c0000 7f63db78 4bffff58 a35a0000 7c2006ac 4bffffc8 <835a0000> 7c2006ac 4bffffbc 3c60c04e ---[ end trace 561bb236c800851f ]--- Kernel panic - not syncing: Attempted to kill init! Call Trace: Rebooting in 180 seconds.. Thanks for any help ... Mike Proicou
_______________________________________________ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev