http://mkl-note.blogspot.com/2009/12/blog-post.html2009年12月24日星期四Some issues while running Linux SMP on ARM11MPCorehttp://lists.infradead.org/pipermail/linux-arm-kernel/2009-December/006650.html
I can duplicate case 1 by keep inserting a simple test module. keep insert and remove modules like below:#include <linux/init.h> #include <linux/module.h>static int __init MYDRIVER_init(void) { printk("%s: \n",__func__); return 0; } static void __exit MYDRIVER_exit(void) { printk("%s: \n",__func__); } MODULE_AUTHOR("Mac Lin"); MODULE_DESCRIPTION("MYDRIVER"); MODULE_LICENSE("GPL"); module_init(MYDRIVER_init); module_exit(MYDRIVER_exit); module=mydriver;modprobe ${module};rmmod ${module}; (...repeat many times...i have it 30 times..) modprobe ${module};rmmod ${module};and keep issuing it for, says , 10 times, without waiting the previous command to complete. then at some point I'll got the case 1. following command won't do, it just can keep runninng. module=mydriver;while : ; do modprobe ${module};rmmod ${module};done; After some tracking, I thought that CONFIG_LOCAL_TIMERS has strange behavior. I disable it, and the situation changed. It's harder to get case 1, but still have some issues. for example, it crash like the following, and it became case 3 [ 57.090000] MYDRIVER_exit: Without DCache and CONFIG_LOCAL_TIMERS, I can repeat the above procedure for 216 seconds, then it halted as case 4. Case 1 also exists. It means without DCache and CONFIG_LOCAL_TIMERS cannot avoid them, but only mitigate a little. BTW, I have done a quick port to linux-2.6.33-rc1, branch master, based on commit f2d9a06. With DCache and CONFIG_LOCAL_TIMERS, I have seen case 1, which means this issue is not fixed yet. Without SMP, I haven't seen such issue yet. So currently all the clues led to SMP. http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/006901.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/006945.html http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/006955.html Neither without SMP nor SMP with maxcpus=1 have the same behavior. Fix for case 1 and case 2 http://lists.infradead.org/pipermail/linux-arm-kernel/2010-January/007052.html Thanks for Russell's advice, after some tracing, I found that my IER (Interrupt Enable Register) of the serial port is 0 under case 1!! Case 2 is actually the same with case 1. Case 1 would come first, if I don't keep input things and let it finish its slow printing, it would then become case 2. UART_BUG_THRE are detected and enabled on my platform, causing serial8250_backup_timeout to be used. There are many places that do ( get IER, clear IER, restore IER ), like serial8250_console_write called by printk, and serial8250_backup_timeout. serial8250_backup_timeout is not protected by spinlock, causing the race condition, and result in wrong IER value. Following patch fix this issue. Case 3 and Case 4 are still often seen, but not case 1 and case 2. diff --git a/kernels/linux-2.6.31.1-X/drivers/serial/8250.c b/kernels/linux-2.6.31.1-X/drivers/serial/8250.c index 288a0e4..55602c3 100644 --- a/kernels/linux-2.6.31.1-cavm1/drivers/serial/8250.c +++ b/kernels/linux-2.6.31.1-cavm1/drivers/serial/8250.c @@ -1752,6 +1758,8 @@ static void serial8250_backup_timeout(unsigned long data) unsigned int iir, ier = 0, lsr; unsigned long flags; + + spin_lock_irqsave(&up->port.lock, flags); /* * Must disable interrupts or else we risk racing with the interrupt * based handler. @@ -1769,10 +1777,8 @@ static void serial8250_backup_timeout(unsigned long data) * the "Diva" UART used on the management processor on many HP * ia64 and parisc boxes. */ - spin_lock_irqsave(&up->port.lock, flags); lsr = serial_in(up, UART_LSR); up->lsr_saved_flags |= lsr & LSR_SAVE_FLAGS; - spin_unlock_irqrestore(&up->port.lock, flags); if ((iir & UART_IIR_NO_INT) && (up->ier & UART_IER_THRI) && (!uart_circ_empty(&up->port.info->xmit) || up->port.x_char) && (lsr & UART_LSR_THRE)) { @@ -1780,12 +1786,14 @@ static void serial8250_backup_timeout(unsigned long data) iir |= UART_IIR_THRI; } - if (!(iir & UART_IIR_NO_INT)) - serial8250_handle_port(up); - if (is_real_interrupt(up->port.irq)) serial_out(up, UART_IER, ier); + spin_unlock_irqrestore(&up->port.lock, flags); + + if (!(iir & UART_IIR_NO_INT)) + serial8250_handle_port(up); + /* Standard timer interval plus 0.2s to keep the port running */ mod_timer(&up->timer, jiffies + poll_timeout(up->port.timeout) + HZ / 5); SMP issues with 8250.c http://old.nabble.com/SMP-issues-with-8250.c%E2%80%8F-to27090634.html http://www.spinics.net/lists/linux-serial/msg02106.html |