On Mon, 2011-07-04 at 10:06 +0200, Ronny Meeus wrote: > On Sat, Jul 2, 2011 at 11:33 PM, Ronny Meeus <[email protected]> wrote: > > Hello > > > > we use have a FreeScale P4040 (powerpc) based board running Linux+Xenomai. > > I copy-paste here some information I found in the bootlog: > > > > [ 0.000000] Using P4080 DS machine description > > [ 0.000000] Memory CAM mapping: 256/256/256 Mb, residual: 1248Mb > > [ 0.000000] Linux version 2.6.35.7-hg98224f47aa52-dirty > > (xxxxx@devws108) (gcc version 4.4.6 (Buildroot 2011.05-hg98224f47aa52) > > ) #1 SMP Fri Jul 1 08:42:30 CEST 2011 > > > > [ 0.000000] clocksource: timebase mult[6aaaf09] shift[22] registered > > [ 0.000000] I-pipe 2.12-01: pipeline enabled. > > [ 0.000000] Console: colour dummy device 80x25 > > [ 0.181150] pid_max: default: 32768 minimum: 301 > > > > [ 2.093842] I-pipe: Domain Xenomai registered. > > [ 2.146016] Xenomai: hal/powerpc started. > > [ 2.193904] Xenomai: scheduling class idle registered. > > [ 2.255328] Xenomai: scheduling class rt registered. > > [ 2.319092] Xenomai: real-time nucleus v2.5.5 (Ghosts) loaded. > > [ 2.388207] Xenomai: starting native API services. > > [ 2.445249] Xenomai: starting pSOS+ services. > > [ 2.497478] highmem bounce pool size: 64 pages > > [ 2.550932] fuse init (API version 7.14) > > > > Although the P4040 has 4 cores, we are currently using only 1 core. > > This is specified in the device tree we are using. > > The kernel runs SMP enabled. > > > > I start 2 test applications on this board. > > The first application is sending raw Ethernet packets on a link that > > is put in loop. The result is that all packets we send are received > > (unmodified) back on the same interface. > > The second application is listening on the same Ethernet interface > > also via a raw Ethernet socket. > > Both application are plain Linux application so no Xenomai code is used. > > > > One side effect of using raw Ethernet sockets is that all packets sent > > on one socket will also be received by all other raw Ethernet sockets. > > This means that the listening application will receive each packet 2 > > times: once while sending and a second time when it is received via > > the loop. (A side question: can the behavior be disabled somehow? We > > basically do not want to receive all packets we send ...) > > > > After a very short time (sending something like 30000 packets), both > > applications block completely and 60 seconds later an indication is > > displayed on the console that the kernel is locked. > > > > [ 805.307213] BUG: soft lockup - CPU#0 stuck for 61s! > > [send_eth_socket:1907] > > [ 805.389519] Modules linked in: reboot_helper dpll_si53xx crave > > ndps_a_cpld > > [ 805.471880] NIP: c000cc4c LR: 00000000 CTR: 00000000 > > [ 805.531274] REGS: c1f87040 TRAP: 0000 Not tainted > > (2.6.35.7-hg98224f47aa52-dirty) > > [ 805.623992] MSR: 00029002 <EE,ME,CE> CR: 00000000 XER: 00000000 > > [ 805.696972] TASK = ec7116d0[1907] 'send_eth_socket' THREAD: ec6aa000 > > CPU: 0 > > [ 805.778248] GPR00: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 805.878359] GPR08: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 805.978452] GPR16: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 806.078571] GPR24: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 806.180773] NIP [c000cc4c] udelay+0x24/0x30 > > [ 806.230782] LR [00000000] (null) > > [ 806.269334] Call Trace: > > [ 806.298521] [efff3b50] [c00071b4] show_stack+0x78/0x18c (unreliable) > > [ 806.374600] [efff3b90] [c00078c4] show_regs+0x200/0x2ec > > [ 806.437125] [efff3bc0] [c00658d4] softlockup_tick+0x1dc/0x23c > > [ 806.505897] [efff3bf0] [c003cc50] run_local_timers+0x1c/0x2c > > [ 806.573626] [efff3c00] [c003cca4] update_process_times+0x44/0x80 > > [ 806.645528] [efff3c20] [c0059bc4] tick_sched_timer+0xd0/0x128 > > [ 806.714307] [efff3c50] [c004d8f0] __run_hrtimer+0x68/0x14c > > [ 806.779958] [efff3c70] [c004efa4] hrtimer_interrupt+0x1d8/0x41c > > [ 806.850812] [efff3cf0] [c000d8d8] timer_interrupt+0x1b4/0x238 > > [ 806.919586] [efff3d10] [c0009ac4] __ipipe_do_timer+0x44/0x54 > > [ 806.987315] [efff3d20] [c006d448] __ipipe_sync_stage+0x1d0/0x27c > > [ 807.059212] [efff3d60] [c0009728] __ipipe_grab_timer+0x104/0x12c > > [ 807.131112] [efff3d70] [c00129e0] __ipipe_ret_from_except+0x0/0xc > > [ 807.204063] --- Exception: 901 at _raw_spin_lock+0x30/0x3c > > [ 807.204068] LR = tpacket_rcv+0x264/0x570 > > [ 807.320754] [efff3e30] [c0325e48] tpacket_rcv+0xf4/0x570 (unreliable) > > [ 807.397875] [efff3e80] [c02c43b0] __netif_receive_skb+0x2b4/0x2f0 > > [ 807.470811] [efff3eb0] [c02c4fa0] netif_receive_skb+0x98/0xac > > [ 807.539583] [efff3ee0] [c0292838] ingress_rx_default_dqrr+0x428/0x4b4 > > [ 807.616693] [efff3f10] [c02ac2a8] qman_poll_dqrr+0x1e0/0x284 > > [ 807.684426] [efff3f50] [c0294088] dpaa_eth_poll+0x34/0xd0 > > [ 807.749031] [efff3f70] [c02c5280] net_rx_action+0xc0/0x1e8 > > [ 807.814683] [efff3fa0] [c0035ab0] __do_softirq+0x138/0x210 > > [ 807.880333] [efff3ff0] [c00115e8] call_do_softirq+0x14/0x24 > > [ 807.947022] [ec6abab0] [c000480c] do_softirq+0xb4/0xec > > [ 808.008503] --- Exception: ec6abbb0 at 0xec6abb70 > > [ 808.008507] LR = 0xec4e6c50 > > [ 808.102274] [ec6abad0] [c00357cc] irq_exit+0x60/0xb8 (unreliable) > > [ 808.175227] [ec6abae0] [c0009b5c] __ipipe_do_IRQ+0x88/0xc0 > > [ 808.240872] [ec6abb00] [c006d468] __ipipe_sync_stage+0x1f0/0x27c > > [ 808.312771] [ec6abb40] [c00095f4] __ipipe_handle_irq+0x1b8/0x1e8 > > [ 808.384669] [ec6abb70] [c00098dc] __ipipe_grab_irq+0x18c/0x1bc > > [ 808.454482] [ec6abba0] [c00129e0] __ipipe_ret_from_except+0x0/0xc > > [ 808.527425] --- Exception: 501 at _raw_spin_lock+0x14/0x3c > > [ 808.527430] LR = tpacket_rcv+0x264/0x570 > > [ 808.644114] [ec6abc60] [c0325e48] tpacket_rcv+0xf4/0x570 (unreliable) > > [ 808.721232] [ec6abcb0] [c02c6238] dev_hard_start_xmit+0x164/0x414 > > [ 808.794171] [ec6abcf0] [c0325b94] packet_sendmsg+0x8c0/0x984 > > [ 808.861901] [ec6abd70] [c02b32f0] sock_sendmsg+0x90/0xb4 > > [ 808.925465] [ec6abe40] [c02b3ea8] sys_sendto+0xd0/0x114 > > [ 808.987988] [ec6abf10] [c02b522c] sys_socketcall+0x148/0x210 > > [ 809.055718] [ec6abf40] [c0011d0c] ret_from_syscall+0x0/0x3c > > [ 809.122407] --- Exception: c01 at 0x48051f00 > > [ 809.122411] LR = 0x4808e030 > > [ 809.210966] Instruction dump: > > [ 809.246401] 7d204850 7f891840 419cfff0 7c421378 4e800020 3d20c04c > > 800967e0 7c0301d6 > > [ 809.339215] 7d2c42a6 48000008 7c210b78 <7d6c42a6> <7d695850> > > 7f8b0040 419cfff0 7c421378 > > [ 874.025894] BUG: soft lockup - CPU#0 stuck for 61s! > > [send_eth_socket:1907] > > [ 874.108198] Modules linked in: reboot_helper dpll_si53xx crave > > ndps_a_cpld > > [ 874.190551] NIP: c000cc48 LR: 00000000 CTR: 00000000 > > [ 874.249937] REGS: c1f87040 TRAP: 0000 Not tainted > > (2.6.35.7-hg98224f47aa52-dirty) > > [ 874.342658] MSR: 00029002 <EE,ME,CE> CR: 00000000 XER: 00000000 > > [ 874.415638] TASK = ec7116d0[1907] 'send_eth_socket' THREAD: ec6aa000 > > CPU: 0 > > [ 874.496907] GPR00: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 874.597018] GPR08: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 874.697124] GPR16: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 874.797235] GPR24: 00000000 00000000 00000000 00000000 00000000 > > 00000000 00000000 00000000 > > [ 874.899421] NIP [c000cc40] udelay+0x18/0x30 > > [ 874.949434] LR [00000000] (null) > > [ 874.987986] Call Trace: > > [ 875.017170] [efff3b50] [c00071b4] show_stack+0x78/0x18c (unreliable) > > [ 875.093240] [efff3b90] [c00078c4] show_regs+0x200/0x2ec > > [ 875.155763] [efff3bc0] [c00658d4] softlockup_tick+0x1dc/0x23c > > [ 875.224534] [efff3bf0] [c003cc50] run_local_timers+0x1c/0x2c > > [ 875.292265] [efff3c00] [c003cca4] update_process_times+0x44/0x80 > > [ 875.364164] [efff3c20] [c0059bc4] tick_sched_timer+0xd0/0x128 > > [ 875.432936] [efff3c50] [c004d8f0] __run_hrtimer+0x68/0x14c > > [ 875.498584] [efff3c70] [c004efa4] hrtimer_interrupt+0x1d8/0x41c > > [ 875.569437] [efff3cf0] [c000d8d8] timer_interrupt+0x1b4/0x238 > > [ 875.638211] [efff3d10] [c0009ac4] __ipipe_do_timer+0x44/0x54 > > [ 875.705941] [efff3d20] [c006d448] __ipipe_sync_stage+0x1d0/0x27c > > [ 875.777839] [efff3d60] [c0009728] __ipipe_grab_timer+0x104/0x12c > > [ 875.849736] [efff3d70] [c00129e0] __ipipe_ret_from_except+0x0/0xc > > [ 875.922680] --- Exception: 901 at _raw_spin_lock+0x30/0x3c > > [ 875.922684] LR = tpacket_rcv+0x264/0x570 > > [ 876.039367] [efff3e30] [c0325e48] tpacket_rcv+0xf4/0x570 (unreliable) > > [ 876.116479] [efff3e80] [c02c43b0] __netif_receive_skb+0x2b4/0x2f0 > > [ 876.189418] [efff3eb0] [c02c4fa0] netif_receive_skb+0x98/0xac > > [ 876.258189] [efff3ee0] [c0292838] ingress_rx_default_dqrr+0x428/0x4b4 > > [ 876.335297] [efff3f10] [c02ac2a8] qman_poll_dqrr+0x1e0/0x284 > > [ 876.403025] [efff3f50] [c0294088] dpaa_eth_poll+0x34/0xd0 > > [ 876.467632] [efff3f70] [c02c5280] net_rx_action+0xc0/0x1e8 > > [ 876.533280] [efff3fa0] [c0035ab0] __do_softirq+0x138/0x210 > > [ 876.598926] [efff3ff0] [c00115e8] call_do_softirq+0x14/0x24 > > [ 876.665618] [ec6abab0] [c000480c] do_softirq+0xb4/0xec > > [ 876.727097] --- Exception: ec6abbb0 at 0xec6abb70 > > [ 876.727101] LR = 0xec4e6c50 > > [ 876.820868] [ec6abad0] [c00357cc] irq_exit+0x60/0xb8 (unreliable) > > [ 876.893814] [ec6abae0] [c0009b5c] __ipipe_do_IRQ+0x88/0xc0 > > [ 876.959459] [ec6abb00] [c006d468] __ipipe_sync_stage+0x1f0/0x27c > > [ 877.031358] [ec6abb40] [c00095f4] __ipipe_handle_irq+0x1b8/0x1e8 > > [ 877.103256] [ec6abb70] [c00098dc] __ipipe_grab_irq+0x18c/0x1bc > > [ 877.173069] [ec6abba0] [c00129e0] __ipipe_ret_from_except+0x0/0xc > > [ 877.246012] --- Exception: 501 at _raw_spin_lock+0x14/0x3c > > [ 877.246017] LR = tpacket_rcv+0x264/0x570 > > [ 877.362701] [ec6abc60] [c0325e48] tpacket_rcv+0xf4/0x570 (unreliable) > > [ 877.439819] [ec6abcb0] [c02c6238] dev_hard_start_xmit+0x164/0x414 > > [ 877.512758] [ec6abcf0] [c0325b94] packet_sendmsg+0x8c0/0x984 > > [ 877.580487] [ec6abd70] [c02b32f0] sock_sendmsg+0x90/0xb4 > > [ 877.644052] [ec6abe40] [c02b3ea8] sys_sendto+0xd0/0x114 > > [ 877.706575] [ec6abf10] [c02b522c] sys_socketcall+0x148/0x210 > > [ 877.774306] [ec6abf40] [c0011d0c] ret_from_syscall+0x0/0x3c > > [ 877.840994] --- Exception: c01 at 0x48051f00 > > [ 877.840998] LR = 0x4808e030 > > [ 877.929553] Instruction dump: > > [ 877.964988] 419cfff0 7c421378 4e800020 3d20c04c 800967e0 7c0301d6 > > 7d2c42a6 48000008 > > [ 878.057802] 7c210b78 7d6c42a6 7d695850 7f8b0040 419cfff0 7c421378 > > 4e800020 3d20c04a > > > > I do not completely understand this dump, but it looks like both the > > receive direction (running in the context of a softirq) and my > > transmitting application are blocked on the spinlock used in the > > tpacket_rcv function: > > > > [ 876.039367] [efff3e30] [c0325e48] tpacket_rcv+0xf4/0x570 (unreliable) > > [ 876.116479] [efff3e80] [c02c43b0] __netif_receive_skb+0x2b4/0x2f0 > > [ 876.189418] [efff3eb0] [c02c4fa0] netif_receive_skb+0x98/0xac > > [ 876.258189] [efff3ee0] [c0292838] ingress_rx_default_dqrr+0x428/0x4b4 > > [ 876.335297] [efff3f10] [c02ac2a8] qman_poll_dqrr+0x1e0/0x284 > > [ 876.403025] [efff3f50] [c0294088] dpaa_eth_poll+0x34/0xd0 > > [ 876.467632] [efff3f70] [c02c5280] net_rx_action+0xc0/0x1e8 > > [ 876.533280] [efff3fa0] [c0035ab0] __do_softirq+0x138/0x210 > > [ 876.598926] [efff3ff0] [c00115e8] call_do_softirq+0x14/0x24 > > [ 876.665618] [ec6abab0] [c000480c] do_softirq+0xb4/0xec > > > > and > > > > [ 877.362701] [ec6abc60] [c0325e48] tpacket_rcv+0xf4/0x570 (unreliable) > > [ 877.439819] [ec6abcb0] [c02c6238] dev_hard_start_xmit+0x164/0x414 > > [ 877.512758] [ec6abcf0] [c0325b94] packet_sendmsg+0x8c0/0x984 > > [ 877.580487] [ec6abd70] [c02b32f0] sock_sendmsg+0x90/0xb4 > > [ 877.644052] [ec6abe40] [c02b3ea8] sys_sendto+0xd0/0x114 > > [ 877.706575] [ec6abf10] [c02b522c] sys_socketcall+0x148/0x210 > > [ 877.774306] [ec6abf40] [c0011d0c] ret_from_syscall+0x0/0x3c > > > > Is my analysis correct? > > If yes, can this have anything to do with the IPIPE mechanism we are > > using (maybe a know issue??). > > > > Any help would be much appreciated. > > > > Thanks, > > Ronny > > > > Hello > > I did a new test (this time with an older kernel Linux version > 2.6.34.6): same tests were executed but this time on a pure Linux > build (no IPIPE included). The issue cannot be reproduced anymore in > this environment. My test builds keep on running forever. > > My next steps are: > - Running the same test on 2.6.35.7 without IPIPE. This enviroment is > currently building. > - Include only IPIPE and no Xenomai and redo the test. >
Could you try 2.6.36-ipipe as well in case 2.6.35.7 without pipeline does not exhibit the issue? A number of changes went in the IRQ replay code during this time frame, and 2.6.35 was in a state of flux regarding this. > Best regards > Ronny > > _______________________________________________ > Adeos-main mailing list > [email protected] > https://mail.gna.org/listinfo/adeos-main -- Philippe. _______________________________________________ Adeos-main mailing list [email protected] https://mail.gna.org/listinfo/adeos-main
