On 4/11/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote:
On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> I got some new information:
> Before soft lockup message is out, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head       10927942 10942560    120   32    1 : tunables   32
> 16    8 : slabdata 341955 341955      6 : globalstat 37602996 11589379
> 1174373    6                              0    1 6918 12166031 1013708
> : cpustat 35254590 2350698 13610965 907286
>
> Then after buffer_head is freed, we have:
> [EMAIL PROTECTED] home]# cat /proc/slabinfo |grep buffer_head
> buffer_head         9542  36384    120   32    1 : tunables   32   16
>   8 : slabdata   1137   1137    245 : globalstat 37602996 11589379
> 1174373    6                                  0    1 6983 20507478
> 1708818 : cpustat 35254625 2350704 16027174 1068367
>
> Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)
These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y


Does this fix it?

--- fs/buffer.c~        2007-02-01 12:00:34.000000000 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.000000000 +0200
@@ -3029,6 +3029,8 @@ out:
                        struct buffer_head *next = bh->b_this_page;
                        free_buffer_head(bh);
                        bh = next;
+
+                       cond_resched();
                } while (bh != buffers_to_free);
        }
        return ret;

So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load......

Thanks,
Forrest
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to