Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-10-02 Thread Randy Dunlap
On Tue, 02 Oct 2007 15:36:01 +0200 Peter Zijlstra wrote: > On Fri, 2007-09-28 at 12:16 -0700, Andrew Morton wrote: > > > (Searches for the lockstat documentation) > > > > Did we forget to do that? > > yeah,... > > /me quickly whips up something Thanks. Just some typos noted below. >

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-10-02 Thread Peter Zijlstra
On Fri, 2007-09-28 at 12:16 -0700, Andrew Morton wrote: > (Searches for the lockstat documentation) > > Did we forget to do that? yeah,... /me quickly whips up something Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]> --- Documentation/lockstat.txt | 119

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-10-02 Thread Peter Zijlstra
On Fri, 2007-09-28 at 12:16 -0700, Andrew Morton wrote: (Searches for the lockstat documentation) Did we forget to do that? yeah,... /me quickly whips up something Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- Documentation/lockstat.txt | 119

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-10-02 Thread Randy Dunlap
On Tue, 02 Oct 2007 15:36:01 +0200 Peter Zijlstra wrote: On Fri, 2007-09-28 at 12:16 -0700, Andrew Morton wrote: (Searches for the lockstat documentation) Did we forget to do that? yeah,... /me quickly whips up something Thanks. Just some typos noted below. Signed-off-by:

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-10-01 Thread Chuck Ebbert
On 09/29/2007 07:04 AM, Fengguang Wu wrote: > On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: >> Hi, >> >> In my testing, a unresponsive file system can hang all I/O in the system. >> This is not seen in 2.4. >> >> I started 20 threads doing I/O on a NFS share. They are just doing 4K >>

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-10-01 Thread Chuck Ebbert
On 09/29/2007 07:04 AM, Fengguang Wu wrote: On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4. I started 20 threads doing I/O on a NFS share. They are just doing 4K writes in a

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Peter Zijlstra
On Sat, 2007-09-29 at 20:28 +0800, Fengguang Wu wrote: > On Sat, Sep 29, 2007 at 01:48:01PM +0200, Peter Zijlstra wrote: > > On the patch itself, not sure if it would have been enough. As soon as > > there is a single dirty inode on the list one would get caught in the > > same problem as

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Fengguang Wu
On Sat, Sep 29, 2007 at 01:48:01PM +0200, Peter Zijlstra wrote: > > On Sat, 2007-09-29 at 19:04 +0800, Fengguang Wu wrote: > > On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: > > > Hi, > > > > > > In my testing, a unresponsive file system can hang all I/O in the system. > > > This is

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Peter Zijlstra
On Sat, 2007-09-29 at 19:04 +0800, Fengguang Wu wrote: > On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: > > Hi, > > > > In my testing, a unresponsive file system can hang all I/O in the system. > > This is not seen in 2.4. > > > > I started 20 threads doing I/O on a NFS share. They

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Fengguang Wu
On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: > Hi, > > In my testing, a unresponsive file system can hang all I/O in the system. > This is not seen in 2.4. > > I started 20 threads doing I/O on a NFS share. They are just doing 4K > writes in a loop. > > Now I stop NFS server

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Fengguang Wu
On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4. I started 20 threads doing I/O on a NFS share. They are just doing 4K writes in a loop. Now I stop NFS server hosting the NFS

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Peter Zijlstra
On Sat, 2007-09-29 at 19:04 +0800, Fengguang Wu wrote: On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4. I started 20 threads doing I/O on a NFS share. They are just doing

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Fengguang Wu
On Sat, Sep 29, 2007 at 01:48:01PM +0200, Peter Zijlstra wrote: On Sat, 2007-09-29 at 19:04 +0800, Fengguang Wu wrote: On Thu, Sep 27, 2007 at 11:32:36PM -0700, Chakri n wrote: Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4.

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-29 Thread Peter Zijlstra
On Sat, 2007-09-29 at 20:28 +0800, Fengguang Wu wrote: On Sat, Sep 29, 2007 at 01:48:01PM +0200, Peter Zijlstra wrote: On the patch itself, not sure if it would have been enough. As soon as there is a single dirty inode on the list one would get caught in the same problem as before.

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Daniel Phillips
On Friday 28 September 2007 06:35, Peter Zijlstra wrote: > ,,,it would be grand (and dangerous) if we could provide for a > button that would just kill off all outstanding pages against a dead > device. Substitute "resources" for "pages" and you begin to get an idea of how tricky that actually

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Daniel Phillips
On Thursday 27 September 2007 23:50, Andrew Morton wrote: > Actually we perhaps could address this at the VFS level in another > way. Processes which are writing to the dead NFS server will > eventually block in balance_dirty_pages() once they've exceeded the > memory limits and will remain

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
No change in behavior even in case of low memory systems. I confirmed it running on 1Gig machine. Thanks --Chakri On 9/28/07, Chakri n <[EMAIL PROTECTED]> wrote: > Here is a the snapshot of vmstats when the problem happened. I believe > this could help a little. > > crash> kmem -V >

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Here is a the snapshot of vmstats when the problem happened. I believe this could help a little. crash> kmem -V NR_FREE_PAGES: 680853 NR_INACTIVE: 95380 NR_ACTIVE: 26891 NR_ANON_PAGES: 2507 NR_FILE_MAPPED: 1832 NR_FILE_PAGES: 119779

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 16:32:18 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Fri, 2007-09-28 at 13:10 -0700, Andrew Morton wrote: > > On Fri, 28 Sep 2007 15:52:28 -0400 > > Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > > > On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: > > > >

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Fri, 2007-09-28 at 13:10 -0700, Andrew Morton wrote: > On Fri, 28 Sep 2007 15:52:28 -0400 > Trond Myklebust <[EMAIL PROTECTED]> wrote: > > > On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: > > > On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust <[EMAIL PROTECTED]> > > > wrote: > > >

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Daniel Phillips
On Friday 28 September 2007 12:52, Trond Myklebust wrote: > I'm not sure that the hang that is illustrated here is so special. It > is an example of a bog-standard ext3 write, that ends up calling the > NFS client, which is hanging. The fact that it happens to be hanging > on the nfsd process is

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 15:52:28 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: > > On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust <[EMAIL PROTECTED]> > > wrote: > > > Looking back, they were getting caught up in > > >

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: > On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > > Looking back, they were getting caught up in > > balance_dirty_pages_ratelimited() and friends. See the attached > > example... > > that one is

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: > > On Fri, 28 Sep 2007 13:00:53 -0400 Trond Myklebust <[EMAIL PROTECTED]> > > wrote: > > > Do these patches also cause the memory reclaimers to steer clear of

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 20:48:59 +0200 Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: > > > Do you know where the stalls are occurring? throttle_vm_writeout(), or via > > direct calls to congestion_wait() from page_alloc.c and vmscan.c?

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: > On Fri, 28 Sep 2007 13:00:53 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > > Do these patches also cause the memory reclaimers to steer clear of > > devices that are congested (and stop waiting on a congested device if > > they see

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: > Do you know where the stalls are occurring? throttle_vm_writeout(), or via > direct calls to congestion_wait() from page_alloc.c and vmscan.c? (running > sysrq-w five or ten times will probably be enough to determine this) would it

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 13:00:53 -0400 Trond Myklebust <[EMAIL PROTECTED]> wrote: > On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: > > > Actually we perhaps could address this at the VFS level in another way. > > Processes which are writing to the dead NFS server will eventually block in >

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 07:28:52 -0600 [EMAIL PROTECTED] (Jonathan Corbet) wrote: > Andrew wrote: > > It's unrelated to the actual value of dirty_thresh: if the machine fills up > > with dirty (or unstable) NFS pages then eventually new writers will block > > until that condition clears. > > > > 2.4

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: > Actually we perhaps could address this at the VFS level in another way. > Processes which are writing to the dead NFS server will eventually block in > balance_dirty_pages() once they've exceeded the memory limits and will > remain

Re: [linux-pm] Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Alan Stern
On Fri, 28 Sep 2007, Peter Zijlstra wrote: > On Fri, 2007-09-28 at 07:28 -0600, Jonathan Corbet wrote: > > Is it really NFS-related? I was trying to back up my 2.6.23-rc8 system > > to an external USB drive the other day when something flaked and the > > drive fell off the bus. That, too, was

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Fri, 2007-09-28 at 07:28 -0600, Jonathan Corbet wrote: > Andrew wrote: > > It's unrelated to the actual value of dirty_thresh: if the machine fills up > > with dirty (or unstable) NFS pages then eventually new writers will block > > until that condition clears. > > > > 2.4 doesn't have this

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Jonathan Corbet
Andrew wrote: > It's unrelated to the actual value of dirty_thresh: if the machine fills up > with dirty (or unstable) NFS pages then eventually new writers will block > until that condition clears. > > 2.4 doesn't have this problem at low levels of dirty data because 2.4 > VFS/MM doesn't account

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
It's works on .23-rc8-mm2 with out any problems. "dd" process does not hang any more. Thanks for all the help. Cheers --Chakri On 9/28/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > [ and one copy for the list too ] > > On Fri, 2007-09-28 at 02:20 -0700, Chakri n wrote: > > It's 2.6.23-rc6.

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
[ and one copy for the list too ] On Fri, 2007-09-28 at 02:20 -0700, Chakri n wrote: > It's 2.6.23-rc6. Could you try .23-rc8-mm2. It includes the per bdi stuff. signature.asc Description: This is a digitally signed message part

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
It's 2.6.23-rc6. Thanks --Chakri On 9/28/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > On Fri, 2007-09-28 at 02:01 -0700, Chakri n wrote: > > Thanks for explaining the adaptive logic. > > > > > However other devices will at that moment try to maintain a limit of 0, > > > which ends up being

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Fri, 2007-09-28 at 02:01 -0700, Chakri n wrote: > Thanks for explaining the adaptive logic. > > > However other devices will at that moment try to maintain a limit of 0, > > which ends up being similar to a sync mount. > > > > So they'll not get stuck, but they will be slow. > > > > > > Sync

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Thanks for explaining the adaptive logic. > However other devices will at that moment try to maintain a limit of 0, > which ends up being similar to a sync mount. > > So they'll not get stuck, but they will be slow. > > Sync should be ok, when the situation is bad like this and some one hijacked

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
[ please don't top-post! ] On Fri, 2007-09-28 at 01:27 -0700, Chakri n wrote: > On 9/27/07, Peter Zijlstra <[EMAIL PROTECTED]> wrote: > > On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: > > > > > What we _don't_ want to happen is for other processes which are writing to > > > other,

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Thanks. The BDI dirty limits sounds like a good idea. Is there already a patch for this, which I could try? I believe it works like this, Each BDI, will have a limit. If the dirty_thresh exceeds the limit, all the I/O on the block device will be synchronous. so, if I have sda & a NFS mount,

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: > What we _don't_ want to happen is for other processes which are writing to > other, non-dead devices to get collaterally blocked. We have patches which > might fix that queued for 2.6.24. Peter? Nasty problem, don't do that :-) But

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Thu, 27 Sep 2007 23:32:36 -0700 "Chakri n" <[EMAIL PROTECTED]> wrote: > Hi, > > In my testing, a unresponsive file system can hang all I/O in the system. > This is not seen in 2.4. > > I started 20 threads doing I/O on a NFS share. They are just doing 4K > writes in a loop. > > Now I stop

A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4. I started 20 threads doing I/O on a NFS share. They are just doing 4K writes in a loop. Now I stop NFS server hosting the NFS share and start a "dd" process to write a file on local EXT3 file

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Thu, 27 Sep 2007 23:32:36 -0700 Chakri n [EMAIL PROTECTED] wrote: Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4. I started 20 threads doing I/O on a NFS share. They are just doing 4K writes in a loop. Now I stop NFS server

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: What we _don't_ want to happen is for other processes which are writing to other, non-dead devices to get collaterally blocked. We have patches which might fix that queued for 2.6.24. Peter? Nasty problem, don't do that :-) But yeah,

A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Hi, In my testing, a unresponsive file system can hang all I/O in the system. This is not seen in 2.4. I started 20 threads doing I/O on a NFS share. They are just doing 4K writes in a loop. Now I stop NFS server hosting the NFS share and start a dd process to write a file on local EXT3 file

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
It's 2.6.23-rc6. Thanks --Chakri On 9/28/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Fri, 2007-09-28 at 02:01 -0700, Chakri n wrote: Thanks for explaining the adaptive logic. However other devices will at that moment try to maintain a limit of 0, which ends up being similar to a

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Fri, 2007-09-28 at 07:28 -0600, Jonathan Corbet wrote: Andrew wrote: It's unrelated to the actual value of dirty_thresh: if the machine fills up with dirty (or unstable) NFS pages then eventually new writers will block until that condition clears. 2.4 doesn't have this problem at

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Jonathan Corbet
Andrew wrote: It's unrelated to the actual value of dirty_thresh: if the machine fills up with dirty (or unstable) NFS pages then eventually new writers will block until that condition clears. 2.4 doesn't have this problem at low levels of dirty data because 2.4 VFS/MM doesn't account for

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
It's works on .23-rc8-mm2 with out any problems. dd process does not hang any more. Thanks for all the help. Cheers --Chakri On 9/28/07, Peter Zijlstra [EMAIL PROTECTED] wrote: [ and one copy for the list too ] On Fri, 2007-09-28 at 02:20 -0700, Chakri n wrote: It's 2.6.23-rc6. Could

Re: [linux-pm] Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Alan Stern
On Fri, 28 Sep 2007, Peter Zijlstra wrote: On Fri, 2007-09-28 at 07:28 -0600, Jonathan Corbet wrote: Is it really NFS-related? I was trying to back up my 2.6.23-rc8 system to an external USB drive the other day when something flaked and the drive fell off the bus. That, too, was

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
[ and one copy for the list too ] On Fri, 2007-09-28 at 02:20 -0700, Chakri n wrote: It's 2.6.23-rc6. Could you try .23-rc8-mm2. It includes the per bdi stuff. signature.asc Description: This is a digitally signed message part

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Thanks. The BDI dirty limits sounds like a good idea. Is there already a patch for this, which I could try? I believe it works like this, Each BDI, will have a limit. If the dirty_thresh exceeds the limit, all the I/O on the block device will be synchronous. so, if I have sda a NFS mount,

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
[ please don't top-post! ] On Fri, 2007-09-28 at 01:27 -0700, Chakri n wrote: On 9/27/07, Peter Zijlstra [EMAIL PROTECTED] wrote: On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: What we _don't_ want to happen is for other processes which are writing to other, non-dead devices

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: Actually we perhaps could address this at the VFS level in another way. Processes which are writing to the dead NFS server will eventually block in balance_dirty_pages() once they've exceeded the memory limits and will remain blocked

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Thanks for explaining the adaptive logic. However other devices will at that moment try to maintain a limit of 0, which ends up being similar to a sync mount. So they'll not get stuck, but they will be slow. Sync should be ok, when the situation is bad like this and some one hijacked all

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Fri, 2007-09-28 at 02:01 -0700, Chakri n wrote: Thanks for explaining the adaptive logic. However other devices will at that moment try to maintain a limit of 0, which ends up being similar to a sync mount. So they'll not get stuck, but they will be slow. Sync should be ok,

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 07:28:52 -0600 [EMAIL PROTECTED] (Jonathan Corbet) wrote: Andrew wrote: It's unrelated to the actual value of dirty_thresh: if the machine fills up with dirty (or unstable) NFS pages then eventually new writers will block until that condition clears. 2.4 doesn't

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Peter Zijlstra
On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: Do you know where the stalls are occurring? throttle_vm_writeout(), or via direct calls to congestion_wait() from page_alloc.c and vmscan.c? (running sysrq-w five or ten times will probably be enough to determine this) would it make

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 13:00:53 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: On Thu, 2007-09-27 at 23:50 -0700, Andrew Morton wrote: Actually we perhaps could address this at the VFS level in another way. Processes which are writing to the dead NFS server will eventually block in

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 20:48:59 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: Do you know where the stalls are occurring? throttle_vm_writeout(), or via direct calls to congestion_wait() from page_alloc.c and vmscan.c? (running

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 13:00:53 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: Do these patches also cause the memory reclaimers to steer clear of devices

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Fri, 2007-09-28 at 11:49 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 13:00:53 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: Do these patches also cause the memory reclaimers to steer clear of devices that are congested (and stop waiting on a congested device if they see that it

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: Looking back, they were getting caught up in balance_dirty_pages_ratelimited() and friends. See the attached example... that one is nfs-on-loopback, which

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Daniel Phillips
On Friday 28 September 2007 12:52, Trond Myklebust wrote: I'm not sure that the hang that is illustrated here is so special. It is an example of a bog-standard ext3 write, that ends up calling the NFS client, which is hanging. The fact that it happens to be hanging on the nfsd process is more

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Andrew Morton
On Fri, 28 Sep 2007 16:32:18 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: On Fri, 2007-09-28 at 13:10 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 15:52:28 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: On Fri, 28 Sep

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Trond Myklebust
On Fri, 2007-09-28 at 13:10 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 15:52:28 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: On Fri, 2007-09-28 at 12:26 -0700, Andrew Morton wrote: On Fri, 28 Sep 2007 15:16:11 -0400 Trond Myklebust [EMAIL PROTECTED] wrote: Looking back,

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
Here is a the snapshot of vmstats when the problem happened. I believe this could help a little. crash kmem -V NR_FREE_PAGES: 680853 NR_INACTIVE: 95380 NR_ACTIVE: 26891 NR_ANON_PAGES: 2507 NR_FILE_MAPPED: 1832 NR_FILE_PAGES: 119779

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Chakri n
No change in behavior even in case of low memory systems. I confirmed it running on 1Gig machine. Thanks --Chakri On 9/28/07, Chakri n [EMAIL PROTECTED] wrote: Here is a the snapshot of vmstats when the problem happened. I believe this could help a little. crash kmem -V

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Daniel Phillips
On Thursday 27 September 2007 23:50, Andrew Morton wrote: Actually we perhaps could address this at the VFS level in another way. Processes which are writing to the dead NFS server will eventually block in balance_dirty_pages() once they've exceeded the memory limits and will remain blocked

Re: A unresponsive file system can hang all I/O in the system on linux-2.6.23-rc6 (dirty_thresh problem?)

2007-09-28 Thread Daniel Phillips
On Friday 28 September 2007 06:35, Peter Zijlstra wrote: ,,,it would be grand (and dangerous) if we could provide for a button that would just kill off all outstanding pages against a dead device. Substitute resources for pages and you begin to get an idea of how tricky that actually is.

Linux 2.6.23-rc6-git3-krf1

2007-09-13 Thread Michal Piotrowski
Hi, There are a few regression fixes in -krf tree http://www.stardust.webpages.pl/files/patches/krf/2.6.23-rc6-git3/2.6.23-rc6-git3-krf1.patch.bz2 http://www.stardust.webpages.pl/files/patches/krf/2.6.23-rc6-git3/2.6.23-rc6-git3-krf1.tar.bz2 Vitaly Bordug:

Linux 2.6.23-rc6-git3-krf1

2007-09-13 Thread Michal Piotrowski
Hi, There are a few regression fixes in -krf tree http://www.stardust.webpages.pl/files/patches/krf/2.6.23-rc6-git3/2.6.23-rc6-git3-krf1.patch.bz2 http://www.stardust.webpages.pl/files/patches/krf/2.6.23-rc6-git3/2.6.23-rc6-git3-krf1.tar.bz2 Vitaly Bordug:

Linux 2.6.23-rc6

2007-09-10 Thread Linus Torvalds
(2): [IA64] Fix unexpected interrupt vector handling [IA64] Clear pending interrupts at CPU boot up time Kyungmin Park (1): [MIPS] i8259: Add disable method. Laurent Riffard (1): Fix broken pata_via cable detection Linus Torvalds (1): Linux 2.6.23-rc6 Masato Noguchi

Linux 2.6.23-rc6

2007-09-10 Thread Linus Torvalds
): [IA64] Fix unexpected interrupt vector handling [IA64] Clear pending interrupts at CPU boot up time Kyungmin Park (1): [MIPS] i8259: Add disable method. Laurent Riffard (1): Fix broken pata_via cable detection Linus Torvalds (1): Linux 2.6.23-rc6 Masato Noguchi (1