Re: NFS livelock / starvation ?

2007-04-16 Thread Peter Zijlstra
On Mon, 2007-04-16 at 16:24 +0800, Zhou Yingchao wrote: When we run a two nfs client and a nfs server in the following way, we met a livelock / starvation condition. MachineAMachineB Client1 Client2 Server As shown in the figure, we run a client and server on one

[PATCH 00/12] per device dirty throttling -v4

2007-04-17 Thread Peter Zijlstra
The latest version of the per device dirty throttling. Dropped all the congestion_wait() churn, will contemplate a rename patch. Reworked the BDI statistics to use percpu_counter. against 2.6.21-rc6-mm1; the first patch is for easy application. Andrew can of course just drop the patch it

[PATCH 07/12] mm: count dirty pages per BDI

2007-04-17 Thread Peter Zijlstra
Count per BDI dirty pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/buffer.c |1 + include/linux/backing-dev.h |1 + mm/page-writeback.c |2 ++ mm/truncate.c |1 + 4 files changed, 5 insertions(+) Index: linux-2.6-mm/fs

[PATCH 11/12] mm: per device dirty threshold

2007-04-17 Thread Peter Zijlstra
. A DBI that has a large dirty limit but does not have any dirty pages outstanding is a waste. What is done is to keep a floating proportion between the DBIs based on writeback completions. This way faster/more active devices get a larger share than slower/idle devices. Signed-off-by: Peter Zijlstra

[PATCH 06/12] mm: scalable bdi statistics counters.

2007-04-17 Thread Peter Zijlstra
Provide scalable per backing_dev_info statistics counters. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h | 50 ++-- mm/backing-dev.c| 26 ++ 2 files changed, 74 insertions(+), 2

[PATCH 10/12] mm: expose BDI statistics in sysfs.

2007-04-17 Thread Peter Zijlstra
Expose the per BDI stats in /sys/block/dev/queue/* Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c | 32 1 file changed, 32 insertions(+) Index: linux-2.6-mm/block/ll_rw_blk.c

[PATCH 03/12] lib: dampen the percpu_counter FBC_BATCH

2007-04-17 Thread Peter Zijlstra
With the current logic the percpu_counter's accuracy delta is quadric wrt the number of cpus in the system, reduce this to O(n ln n). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/percpu_counter.h |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) Index: linux

[PATCH 05/12] mm: bdi init hooks

2007-04-17 Thread Peter Zijlstra
provide BDI constructor/destructor hooks Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c |2 ++ drivers/block/rd.c |6 ++ drivers/char/mem.c |2 ++ drivers/mtd/mtdcore.c |5 + fs/char_dev.c

[PATCH 12/12] debug: expose BDI statistics in sysfs.

2007-04-17 Thread Peter Zijlstra
Expose the per BDI stats in /sys/block/dev/queue/* Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c | 49 + mm/page-writeback.c |2 +- 2 files changed, 50 insertions(+), 1 deletion(-) Index: linux-2.6/block/ll_rw_blk.c

[PATCH 01/12] revert per-backing_dev-dirty-and-writeback-page-accounting

2007-04-17 Thread Peter Zijlstra
For ease of application.. --- block/ll_rw_blk.c | 29 - fs/buffer.c |1 - include/linux/backing-dev.h |2 -- mm/page-writeback.c | 13 ++--- mm/truncate.c |1 - 5 files changed, 2 insertions(+),

[PATCH 02/12] nfs: remove congestion_end()

2007-04-17 Thread Peter Zijlstra
Its redundant, clear_bdi_congested() already wakes the waiters. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/write.c |4 +--- include/linux/backing-dev.h |1 - mm/backing-dev.c| 13 - 3 files changed, 1 insertion(+), 17 deletions

[PATCH 08/12] mm: count writeback pages per BDI

2007-04-17 Thread Peter Zijlstra
Count per BDI writeback pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h |1 + mm/page-writeback.c | 12 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) Index: linux-2.6/mm/page-writeback.c

[PATCH 04/12] lib: percpu_counter_mod64

2007-04-17 Thread Peter Zijlstra
Add percpu_counter_mod64() to allow large modifications. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/percpu_counter.h |9 + lib/percpu_counter.c | 28 2 files changed, 37 insertions(+) Index: linux-2.6/include/linux

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Peter Zijlstra
On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote: I've split the issues introduced by the 2.6.21-rcX write code up into 4 subproblems. The first patch is just a cleanup in order to ease review. Patch number 2 ensures that we never release the PG_writeback flag until _after_ we've

Re: [PATCH 0/4] 2.6.21-rc7 NFS writes: fix a series of issues

2007-04-18 Thread Peter Zijlstra
On Wed, 2007-04-18 at 10:19 +0200, Peter Zijlstra wrote: On Tue, 2007-04-17 at 21:19 -0400, Trond Myklebust wrote: I've split the issues introduced by the 2.6.21-rcX write code up into 4 subproblems. The first patch is just a cleanup in order to ease review. Patch number 2 ensures

Re: [PATCH 11/12] mm: per device dirty threshold

2007-04-19 Thread Peter Zijlstra
if they are not. Ah, yes, good catch. How about this: --- Since we're adding 3 stat counters, tripple the per counter delta as well. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- mm/page-writeback.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6/mm/page-writeback.c

Re: [PATCH 09/12] mm: count unstable pages per BDI

2007-04-19 Thread Peter Zijlstra
On Thu, 2007-04-19 at 19:44 +0200, Miklos Szeredi wrote: Count per BDI unstable pages. I'm wondering, is it really worth having this category separate from per BDI brity pages? With the exception of the export to sysfs, always the sum of unstable + dirty is used. I guess you are

Re: [PATCH 09/12] mm: count unstable pages per BDI

2007-04-19 Thread Peter Zijlstra
On Thu, 2007-04-19 at 20:12 +0200, Peter Zijlstra wrote: On Thu, 2007-04-19 at 19:44 +0200, Miklos Szeredi wrote: Count per BDI unstable pages. I'm wondering, is it really worth having this category separate from per BDI brity pages? With the exception of the export to sysfs

Re: [PATCH 09/12] mm: count unstable pages per BDI

2007-04-19 Thread Peter Zijlstra
On Thu, 2007-04-19 at 20:46 +0200, Peter Zijlstra wrote: On Thu, 2007-04-19 at 20:12 +0200, Peter Zijlstra wrote: On Thu, 2007-04-19 at 19:44 +0200, Miklos Szeredi wrote: Count per BDI unstable pages. I'm wondering, is it really worth having this category separate from per

Re: [PATCH 09/12] mm: count unstable pages per BDI

2007-04-19 Thread Peter Zijlstra
On Thu, 2007-04-19 at 21:20 +0200, Miklos Szeredi wrote: Index: linux-2.6/fs/buffer.c === --- linux-2.6.orig/fs/buffer.c 2007-04-19 19:59:26.0 +0200 +++ linux-2.6/fs/buffer.c 2007-04-19 20:35:39.0 +0200

[PATCH 09/10] mm: expose BDI statistics in sysfs.

2007-04-20 Thread Peter Zijlstra
Expose the per BDI stats in /sys/block/dev/queue/* Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c | 32 1 file changed, 32 insertions(+) Index: linux-2.6-mm/block/ll_rw_blk.c

[PATCH 00/10] per device dirty throttling -v5

2007-04-20 Thread Peter Zijlstra
The latest version of the per device dirty throttling. against 2.6.21-rc6-mm1; the first patch is for easy application. Andrew can of course just drop the patch it reverts. Merged BDI_DIRTY and BDI_UNSTABLE into BDI_RECLAIMABLE, and multiplied bdi_stat_delta() by the number of counters summed.

[PATCH 01/10] revert per-backing_dev-dirty-and-writeback-page-accounting

2007-04-20 Thread Peter Zijlstra
For ease of application.. --- block/ll_rw_blk.c | 29 - fs/buffer.c |1 - include/linux/backing-dev.h |2 -- mm/page-writeback.c | 13 ++--- mm/truncate.c |1 - 5 files changed, 2 insertions(+),

[PATCH 04/10] lib: percpu_counter_mod64

2007-04-20 Thread Peter Zijlstra
Add percpu_counter_mod64() to allow large modifications. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/percpu_counter.h |9 + lib/percpu_counter.c | 28 2 files changed, 37 insertions(+) Index: linux-2.6/include/linux

[PATCH 07/10] mm: count reclaimable pages per BDI

2007-04-20 Thread Peter Zijlstra
Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/buffer.c |2 ++ fs/nfs/write.c |7 +++ include/linux/backing-dev.h |1 + mm/page-writeback.c |4 mm

[PATCH 10/10] mm: per device dirty threshold

2007-04-20 Thread Peter Zijlstra
. A DBI that has a large dirty limit but does not have any dirty pages outstanding is a waste. What is done is to keep a floating proportion between the DBIs based on writeback completions. This way faster/more active devices get a larger share than slower/idle devices. Signed-off-by: Peter Zijlstra

[PATCH 08/10] mm: count writeback pages per BDI

2007-04-20 Thread Peter Zijlstra
Count per BDI writeback pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h |1 + mm/page-writeback.c | 12 ++-- 2 files changed, 11 insertions(+), 2 deletions(-) Index: linux-2.6/mm/page-writeback.c

[PATCH 02/10] nfs: remove congestion_end()

2007-04-20 Thread Peter Zijlstra
Its redundant, clear_bdi_congested() already wakes the waiters. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/write.c |4 +--- include/linux/backing-dev.h |1 - mm/backing-dev.c| 13 - 3 files changed, 1 insertion(+), 17 deletions

[PATCH 03/10] lib: dampen the percpu_counter FBC_BATCH

2007-04-20 Thread Peter Zijlstra
With the current logic the percpu_counter's accuracy delta is quadric wrt the number of cpus in the system, reduce this to O(n ln n). Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/percpu_counter.h |7 ++- 1 file changed, 2 insertions(+), 5 deletions(-) Index: linux

[PATCH 06/10] mm: scalable bdi statistics counters.

2007-04-20 Thread Peter Zijlstra
Provide scalable per backing_dev_info statistics counters. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h | 50 ++-- mm/backing-dev.c| 26 ++ 2 files changed, 74 insertions(+), 2

[PATCH 05/10] mm: bdi init hooks

2007-04-20 Thread Peter Zijlstra
provide BDI constructor/destructor hooks Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c |2 ++ drivers/block/rd.c |6 ++ drivers/char/mem.c |2 ++ drivers/mtd/mtdcore.c |5 + fs/char_dev.c

Re: [PATCH 03/10] lib: dampen the percpu_counter FBC_BATCH

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 02:55 -0700, Andrew Morton wrote: On Fri, 20 Apr 2007 17:51:57 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: With the current logic the percpu_counter's accuracy delta is quadric wrt the number of cpus in the system, reduce this to O(n ln n). Signed-off-by: Peter

Re: [PATCH 04/10] lib: percpu_counter_mod64

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 02:55 -0700, Andrew Morton wrote: On Fri, 20 Apr 2007 17:51:58 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: Add percpu_counter_mod64() to allow large modifications. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/percpu_counter.h |9

Re: [PATCH 07/10] mm: count reclaimable pages per BDI

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 02:55 -0700, Andrew Morton wrote: On Fri, 20 Apr 2007 17:52:01 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable. hm. Aggregating dirty and unstable at inc/dec time is a bit kludgy. If later on we

Re: [PATCH 08/10] mm: count writeback pages per BDI

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 02:55 -0700, Andrew Morton wrote: On Fri, 20 Apr 2007 17:52:02 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: Count per BDI writeback pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h |1 + mm/page-writeback.c

Re: [PATCH 09/10] mm: expose BDI statistics in sysfs.

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 02:55 -0700, Andrew Morton wrote: On Fri, 20 Apr 2007 17:52:03 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: Expose the per BDI stats in /sys/block/dev/queue/* Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c | 32

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 02:55 -0700, Andrew Morton wrote: On Fri, 20 Apr 2007 17:52:04 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: Scale writeback cache per backing device, proportional to its writeout speed. By decoupling the BDI dirty thresholds a number of problems we currently

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-21 Thread Peter Zijlstra
+/* + * maximal error of a stat counter. + */ +static inline unsigned long bdi_stat_delta(void) +{ +#ifdef CONFIG_SMP + return NR_CPUS * FBC_BATCH; This is enormously wrong for CONFIG_NR_CPUS=1024 on a 2-way. Right, I knew about that but, uhm. I wanted to make that

[RFC][PATCH] reiserfs vs BKL

2007-04-21 Thread Peter Zijlstra
with remaining BKL users (notably tty). Compile tested only, since I didn't dare boot it. NOT-Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/reiserfs/inode.c|4 ++-- fs/reiserfs/journal.c | 12 ++-- fs/reiserfs/super.c|1 + fs/reiserfs/xattr.c

Re: [RFC][PATCH] reiserfs vs BKL

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 12:14 -0400, Jeff Mahoney wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Peter Zijlstra wrote: Replace all the lock_kernel() instances with reiserfs_write_lock(sb), and make that use an actual per super-block mutex instead of lock_kernel(). This should

Re: [PATCH 04/10] lib: percpu_counter_mod64

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 12:21 -0700, Andrew Morton wrote: On Sat, 21 Apr 2007 13:02:26 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: + cpu = get_cpu(); + pcount = per_cpu_ptr(fbc-counters, cpu); + count = *pcount + amount; + if (count = FBC_BATCH || count

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-21 Thread Peter Zijlstra
On Sat, 2007-04-21 at 14:15 +0200, Peter Zijlstra wrote: +/* + * maximal error of a stat counter. + */ +static inline unsigned long bdi_stat_delta(void) +{ +#ifdef CONFIG_SMP + return NR_CPUS * FBC_BATCH; This is enormously wrong for CONFIG_NR_CPUS=1024

Re: Loud pop coming from hard drive on reboot

2007-04-21 Thread Peter Zijlstra
On Wed, 2007-04-18 at 17:27 -0400, Chuck Ebbert wrote: Bartlomiej Zolnierkiewicz wrote: On Wednesday 18 April 2007, Chuck Ebbert wrote: Mark Lord wrote: Mark Lord wrote: With the patch applied, I don't see *any* new activity in those S.M.A.R.T. attributes over multiple hibernates

Re: [PATCH 08/10] mm: count writeback pages per BDI

2007-04-22 Thread Peter Zijlstra
On Sun, 2007-04-22 at 00:19 -0700, Andrew Morton wrote: It could be that we never call test_clear_page_writeback() against !bdi_cap_writeback_dirty() pages anwyay. I can't think why we would, but the relationships there aren't very clear. Does don't account for dirty memory imply doesn't

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-23 Thread Peter Zijlstra
On Sat, 2007-04-21 at 22:25 +0200, Miklos Szeredi wrote: The other deadlock, in throttle_vm_writeout() is still to be solved. Let's go back to the original changelog: Author: marcelo.tosatti marcelo.tosatti Date: Tue Mar 8 17:25:19 2005 + [PATCH] vm: pageout

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-23 Thread Peter Zijlstra
On Mon, 2007-04-23 at 08:48 -0700, Christoph Lameter wrote: On Sat, 21 Apr 2007, Peter Zijlstra wrote: This is enormously wrong for CONFIG_NR_CPUS=1024 on a 2-way. Right, I knew about that but, uhm. I wanted to make that num_online_cpus(), and install a hotplug notifier to fold

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Peter Zijlstra
On Tue, 2007-04-24 at 12:58 +1000, Neil Brown wrote: On Friday April 20, [EMAIL PROTECTED] wrote: Scale writeback cache per backing device, proportional to its writeout speed. So it works like this: We account for writeout in full pages. When a page has the Writeback flag cleared,

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Peter Zijlstra
On Tue, 2007-04-24 at 10:19 +0200, Miklos Szeredi wrote: This is probably a reasonable thing to do but it doesn't feel like the right place. I think get_dirty_limits should return the raw threshold, and balance_dirty_pages should do both tests - the bdi-local test and the

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Peter Zijlstra
On Tue, 2007-04-24 at 11:14 +0200, Miklos Szeredi wrote: I'm still not quite sure what purpose the above soft limiting serves. It seems to just give advantage to writers, which managed to accumulate lots of dirty pages, and then can convert that into even more dirtyings. The

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Peter Zijlstra
On Tue, 2007-04-24 at 03:00 -0700, Andrew Morton wrote: On Tue, 24 Apr 2007 11:47:20 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: Ahh, now I see; I had totally blocked out these few lines: pages_written += write_chunk - wbc.nr_to_write; if

Re: [PATCH 10/10] mm: per device dirty threshold

2007-04-24 Thread Peter Zijlstra
On Tue, 2007-04-24 at 12:19 +0200, Miklos Szeredi wrote: Ahh, now I see; I had totally blocked out these few lines: pages_written += write_chunk - wbc.nr_to_write; if (pages_written = write_chunk)

Re: [PATCH 3/7] revoke: core code

2007-03-09 Thread Peter Zijlstra
On Fri, 2007-03-09 at 10:15 +0200, Pekka J Enberg wrote: +static int revoke_vma(struct vm_area_struct *vma, struct zap_details *details) +{ + unsigned long restart_addr, start_addr, end_addr; + int need_break; + + start_addr = vma-vm_start; + end_addr = vma-vm_end; + +

Re: [RFC][PATCH 0/3] swsusp: Stop using page flags

2007-03-11 Thread Peter Zijlstra
On Sun, 2007-03-11 at 11:17 +0100, Rafael J. Wysocki wrote: Hi, The following three patches make swsusp use its own data structures for memory management instead of special page flags. Thus the page flags used so far by swsusp (PG_nosave, PG_nosave_free) can be used for other purposes and I

Re: lockdep question (was Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0!)

2007-03-11 Thread Peter Zijlstra
On Sun, 2007-03-11 at 15:50 +0200, Michael S. Tsirkin wrote: Quoting Roland Dreier [EMAIL PROTECTED]: Subject: Re: IPoIB caused a kernel: BUG: soft lockup detected on CPU#0! Feb 27 17:47:52 sw169 kernel: [8053aaf1] _spin_lock_irqsave+0x15/0x24 Feb 27 17:47:52 sw169 kernel:

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Peter Zijlstra
On Mon, 2007-03-12 at 08:26 -0700, Linus Torvalds wrote: So good fairness really should involve some notion of work done for others. It's just not very easy to do.. A solution that is already in demand is a class based scheduler, where the thread doing work for a client (temp.) joins the

Re: [PATCH][RSDL-mm 0/7] RSDL cpu scheduler for 2.6.21-rc3-mm2

2007-03-12 Thread Peter Zijlstra
On Mon, 2007-03-12 at 21:11 +0100, Mike Galbraith wrote: How would you go about ensuring that there won't be any cycles wasted? SCHED_IDLE or otherwise nice 19 Killing the known corner case starvation scenarios is wonderful, but let's not just pretend that interactive tasks don't have any

Re: [PATCH 0/3] VM throttling: avoid blocking occasional writers

2007-03-14 Thread Peter Zijlstra
* (previous cycle speed) + this cycle's events. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c |3 + include/linux/backing-dev.h |7 +++ include/linux/writeback.h | 10 kernel/sysctl.c | 10 +++- mm/page-writeback.c | 102

Re: NPTL patch for linux 2.4.28

2007-03-15 Thread Peter Zijlstra
On Thu, 2007-03-15 at 03:14 +0530, Syed Ahemed wrote: Getting RHEL's source ( http://lkml.org/lkml/2005/3/21/380 ) was an idea i thought about but then a download of the RHEL source from the following location was denied . http://download.fedora.redhat.com/pub/fedora/linux/core/1/SRPMS/ and

Re: [PATCH] mm/filemap.c: unconditionally call mark_page_accessed

2007-03-15 Thread Peter Zijlstra
On Wed, 2007-03-14 at 15:58 -0400, Ashif Harji wrote: This patch unconditionally calls mark_page_accessed to prevent pages, especially for small files, from being evicted from the page cache despite frequent access. Since we're hackling over the use-once stuff again... /me brings up:

Re: [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements

2007-03-16 Thread Peter Zijlstra
On Thu, 2007-03-15 at 20:10 +0100, Eric Dumazet wrote: Hi I'm pleased to present these patches which improve linux futex performance and scalability, on both UP, SMP and NUMA configs. I had this idea last year but I was not understood, probably because I gave not enough explanations.

Re: [2.6.20] BUG: workqueue leaked lock

2007-03-16 Thread Peter Zijlstra
On Thu, 2007-03-15 at 11:06 -0800, Andrew Morton wrote: On Tue, 13 Mar 2007 17:50:14 +0100 Folkert van Heusden [EMAIL PROTECTED] wrote: ... [ 1756.728209] BUG: workqueue leaked lock or atomic: nfsd4/0x/3577 [ 1756.728271] last function: laundromat_main+0x0/0x69 [nfsd] [

Re: [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements

2007-03-16 Thread Peter Zijlstra
On Fri, 2007-03-16 at 10:30 +0100, Eric Dumazet wrote: On Friday 16 March 2007 09:05, Peter Zijlstra wrote: On Thu, 2007-03-15 at 20:10 +0100, Eric Dumazet wrote: Hi I'm pleased to present these patches which improve linux futex performance and scalability, on both UP, SMP and NUMA

Re: [PATCH 2.6.21-rc3-mm2 3/4] futex_requeue_pi optimization

2007-03-16 Thread Peter Zijlstra
On Tue, 2007-03-13 at 10:52 +0100, [EMAIL PROTECTED] wrote: plain text document attachment (futex-requeue-pi.diff) This patch provides the futex_requeue_pi functionality. This provides an optimization, already used for (normal) futexes, to be used for PI-futexes. This optimization is

Re: [PATCH 0/3] FUTEX : new PRIVATE futexes, SMP and NUMA improvements

2007-03-16 Thread Peter Zijlstra
On Fri, 2007-03-16 at 11:30 +0100, Eric Dumazet wrote: On Friday 16 March 2007 11:10, Peter Zijlstra wrote: http://programming.kicks-ass.net/kernel-patches/futex-vma-cache/vma_cache.p atch Oh thanks But if it has to walk the vmas (and take mmap_sem), you already loose

Re: [GIT] NFS client updates for 2.6.20

2007-02-13 Thread Peter Zijlstra
-by: Peter Zijlstra [EMAIL PROTECTED] Cc: Trond Myklebust [EMAIL PROTECTED] --- fs/nfs/write.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6-git/fs/nfs/write.c === --- linux-2.6-git.orig/fs/nfs/write.c 2007-01

[PATCH] lockdep: annotate BLKPG_DEL_PARTITION

2007-02-16 Thread Peter Zijlstra
+0x252/0x265 [c047f04e] sys_ioctl+0x49/0x63 [c0404070] syscall_call+0x7/0xb Annotate BLKPG_DEL_PARTITION's bd_mutex locking and add a little comment clarifying the bd_mutex locking, because I confused myself and initially thought the lock order was wrong too. Signed-off-by: Peter Zijlstra [EMAIL

Re: [PATCH 2.6.20 1/1] fbdev,mm: hecuba/E-Ink fbdev driver

2007-02-17 Thread Peter Zijlstra
On Sat, 2007-02-17 at 11:42 +0100, Jaya Kumar wrote: Hi James, Geert, lkml and mm, Hi Jaya, This patch adds support for the Hecuba/E-Ink display with deferred IO. The changes from the previous version are to switch to using a mutex and lock_page. I welcome your feedback and advice. This

[RFC][PATCH 2/6] mm: count dirty pages per BDI

2007-03-19 Thread Peter Zijlstra
Count per BDI dirty pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/buffer.c |1 + include/linux/backing-dev.h |1 + mm/page-writeback.c |2 ++ mm/truncate.c |1 + 4 files changed, 5 insertions(+) Index: linux-2.6/fs/buffer.c

[RFC][PATCH 3/6] mm: count writeback pages per BDI

2007-03-19 Thread Peter Zijlstra
Count per BDI writeback pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h |1 + mm/page-writeback.c |8 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) Index: linux-2.6/mm/page-writeback.c

[RFC][PATCH 1/6] mm: scalable bdi statistics counters.

2007-03-19 Thread Peter Zijlstra
Provide scalable per backing_dev_info statistics counters modeled on the ZVC code. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c |1 drivers/block/rd.c |2 drivers/char/mem.c |2 fs/char_dev.c |1 include/linux

[RFC][PATCH 5/6] mm: per device dirty threshold

2007-03-19 Thread Peter Zijlstra
' in writeout events. Each writeout increases time and adds to a per bdi counter. This counter is halved when a period expires. So per bdi speed is: 0.5 * (previous cycle speed) + this cycle's events. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h |8 ++ mm

[RFC][PATCH 4/6] mm: count unstable pages per BDI

2007-03-19 Thread Peter Zijlstra
Count per BDI unstable pages. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/nfs/write.c |4 include/linux/backing-dev.h |1 + 2 files changed, 5 insertions(+) Index: linux-2.6/fs/nfs/write.c

[RFC][PATCH 0/6] per device dirty throttling

2007-03-19 Thread Peter Zijlstra
This patch-set implements per device dirty page throttling. Which should solve the problem we currently have with one device hogging the dirty limit. Preliminary testing shows good results: mem=128M time (dd if=/dev/zero of=/mnt/dev/zero bs=4096 count=$((1024*1024/4)); sync) 1GB to disk real

[RFC][PATCH 6/6] mm: expose BDI statistics in sysfs.

2007-03-19 Thread Peter Zijlstra
Expose the per BDI stats in /sys/block/dev/queue/* Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c | 51 +++ 1 file changed, 51 insertions(+) Index: linux-2.6/block/ll_rw_blk.c

Re: [RFC][PATCH 0/6] per device dirty throttling

2007-03-19 Thread Peter Zijlstra
Sorry for duplicates, I was fooled by an MTA hanging on to them for a few hours. I counted them lost in cyberspace. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at

[RFC][PATCH 7/6] assorted fixes

2007-03-19 Thread Peter Zijlstra
unbounded. It goes *BANG* when using NFS,... need to look into that. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- include/linux/backing-dev.h | 12 mm/page-writeback.c | 37 ++--- 2 files changed, 34 insertions(+), 15 deletions(-) Index

Re: [RFC][PATCH 0/6] per device dirty throttling

2007-03-20 Thread Peter Zijlstra
On Tue, 2007-03-20 at 18:47 +1100, David Chinner wrote: On Mon, Mar 19, 2007 at 04:57:37PM +0100, Peter Zijlstra wrote: This patch-set implements per device dirty page throttling. Which should solve the problem we currently have with one device hogging the dirty limit. Preliminary

Re: [RFC][PATCH 0/6] per device dirty throttling

2007-03-20 Thread Peter Zijlstra
On Tue, 2007-03-20 at 20:38 +1100, David Chinner wrote: On Tue, Mar 20, 2007 at 09:08:24AM +0100, Peter Zijlstra wrote: On Tue, 2007-03-20 at 18:47 +1100, David Chinner wrote: So overall we've lost about 15-20% of the theoretical aggregate perfomrance, but we haven't starved any

Re: [RFC][PATCH 0/6] per device dirty throttling

2007-03-20 Thread Peter Zijlstra
without triggering NMI/softlockup msgs. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- fs/fuse/inode.c |1 + fs/nfs/client.c |1 + mm/page-writeback.c | 25 +++-- 3 files changed, 21 insertions(+), 6 deletions(-) Index: linux-2.6/fs/nfs/client.c

Re: [PATCH 2.6.21-rc3-mm2 3/4] futex_requeue_pi optimization

2007-03-20 Thread Peter Zijlstra
On Tue, 2007-03-20 at 16:32 +0100, Pierre Peiffer wrote: Peter Zijlstra a écrit : +static void *get_futex_address(union futex_key *key) +{ + void *uaddr; + + if (key-both.offset 1) { + /* shared mapping */ + uaddr = (void*)((key-shared.pgoff PAGE_SHIFT

Re: RSDL v0.31

2007-03-21 Thread Peter Zijlstra
On Wed, 2007-03-21 at 15:57 +0100, Mike Galbraith wrote: 'f' is a progglet which sleeps a bit and burns a bit, duration depending on argument given. 'sh' is a shell 100% hog. In this scenario, the argument was set such that 'f' used right at 50% cpu. All are started at the same time, and I

Re: [PATCH] lockdep: lockdep_depth vs. debug_locks Re: [2.6.20] BUG: workqueue leaked lock

2007-03-22 Thread Peter Zijlstra
state was there. Reported-by: Folkert van Heusden [EMAIL PROTECTED] Inspired-by: Oleg Nesterov [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] This looks sane, thanks for figuring this out. Acked-by: Peter Zijlstra [EMAIL PROTECTED] --- diff -Nurp 2.6.21-rc4-git4

Re: [PATCH] lockdep: debug_show_all_locks debug_show_held_locks vs. debug_locks

2007-03-22 Thread Peter Zijlstra
direct use of this fields isn't recommended either.) Reported-by: Folkert van Heusden [EMAIL PROTECTED] Inspired-by: Oleg Nesterov [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] Acked-by: Peter Zijlstra [EMAIL PROTECTED] --- diff -Nurp 2.6.21-rc4-git4-/kernel

Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()

2007-03-25 Thread Peter Zijlstra
On Sat, 2007-03-24 at 22:55 +0100, Miklos Szeredi wrote: This is a slightly different take on the fix for the deadlock in fuse with dirty balancing. David Chinner convinced me, that per-bdi counters are too expensive, and that it's not worth trying to account the number of pages under

Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()

2007-03-25 Thread Peter Zijlstra
On Sun, 2007-03-25 at 13:34 +0200, Miklos Szeredi wrote: Please have a look at this: http://lkml.org/lkml/2007/3/19/220 + if (bdi_nr_reclaimable + bdi_stat(bdi, BDI_WRITEBACK) = + bdi_thresh) + break;

Re: [patch 2/3] only allow nonlinear vmas for ram backed filesystems

2007-03-25 Thread Peter Zijlstra
code cost. All known users of nonlinear mappings actually use tmpfs, so this shouldn't have any negative effect. Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] Acked-by: Peter Zijlstra [EMAIL PROTECTED] --- Index: linux-2.6.21-rc4-mm1/mm/fremap.c

Re: [patch 1/3] split mmap

2007-03-25 Thread Peter Zijlstra
the actual mapping Signed-off-by: Miklos Szeredi [EMAIL PROTECTED] Acked-by: Peter Zijlstra [EMAIL PROTECTED] --- Index: linux/mm/mmap.c === --- linux.orig/mm/mmap.c 2007-03-24 21:00:40.0 +0100 +++ linux/mm

Re: [patch 3/3] update ctime and mtime for mmaped write

2007-03-25 Thread Peter Zijlstra
On Sat, 2007-03-24 at 23:11 +0100, Miklos Szeredi wrote: From: Miklos Szeredi [EMAIL PROTECTED] Changes: v3: o rename is_page_modified to test_clear_page_modified v2: o set AS_CMTIME flag in clear_page_dirty_for_io() too o don't clear AS_CMTIME in file_update_time() o check the dirty

Re: [patch 2/3] only allow nonlinear vmas for ram backed filesystems

2007-03-25 Thread Peter Zijlstra
On Sun, 2007-03-25 at 16:00 -0800, Andrew Morton wrote: On Sat, 24 Mar 2007 23:09:19 +0100 Miklos Szeredi [EMAIL PROTECTED] wrote: Dirty page accounting/limiting doesn't work for nonlinear mappings, Doesn't it? iirc the problem is that we don't correctly re-clean the ptes while starting

Re: [patch 1/3] fix illogical behavior in balance_dirty_pages()

2007-03-26 Thread Peter Zijlstra
On Mon, 2007-03-26 at 02:08 -0800, Andrew Morton wrote: On Mon, 26 Mar 2007 11:32:47 +0200 Miklos Szeredi [EMAIL PROTECTED] wrote: Stopping writers which have idle queues is completely unproductive, and that is basically what the current algorithm does. This is because the kernel permits

Re: [PATCH 10/12] mm: page_alloc_wait

2007-04-06 Thread Peter Zijlstra
On Thu, 2007-04-05 at 15:57 -0700, Andrew Morton wrote: On Thu, 05 Apr 2007 19:42:19 +0200 [EMAIL PROTECTED] wrote: Introduce a mechanism to wait on free memory. Currently congestion_wait() is abused to do this. Such a very small explanation for such a terrifying change. Yes, I suck

Re: [PATCH 11/12] mm: accurate pageout congestion wait

2007-04-06 Thread Peter Zijlstra
On Thu, 2007-04-05 at 16:17 -0700, Andrew Morton wrote: On Thu, 05 Apr 2007 19:42:20 +0200 [EMAIL PROTECTED] wrote: Only do the congestion wait when we actually encountered congestion. The name congestion_wait() was accurate back in 2002, but it isn't accurate any more, and you got

Re: [PATCH 12/12] mm: per BDI congestion feedback

2007-04-06 Thread Peter Zijlstra
On Thu, 2007-04-05 at 16:24 -0700, Andrew Morton wrote: On Thu, 05 Apr 2007 19:42:21 +0200 [EMAIL PROTECTED] wrote: Now that we have per BDI dirty throttling is makes sense to also have oer BDI congestion feedback; why wait on another device if the current one is not congested.

Re: [PATCH 02/12] mm: scalable bdi statistics counters.

2007-04-06 Thread Peter Zijlstra
On Thu, 2007-04-05 at 15:37 -0700, Andrew Morton wrote: On Thu, 05 Apr 2007 19:42:11 +0200 [EMAIL PROTECTED] wrote: Provide scalable per backing_dev_info statistics counters modeled on the ZVC code. Signed-off-by: Peter Zijlstra [EMAIL PROTECTED] --- block/ll_rw_blk.c

Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes)

2007-04-06 Thread Peter Zijlstra
Hi, some thoughts on shared futexes; Could we get rid of the mmap_sem on the shared futexes in the following manner: - do a page table walk to find the pte; - get a page using pfn_to_page (skipping VM_PFNMAP) - get the futex key from page-mapping-host and page-index and offset from addr

Re: [PATCH] FUTEX : new PRIVATE futexes

2007-04-06 Thread Peter Zijlstra
kernel with it :) Andrew, could we get this in mm as well ? This version is against 2.6.21-rc5-mm4 (so supports 64bit futexes) In this third version I dropped the NUMA optims and process private hash table, to let new API come in and be tested. Good work, Thanks! Acked-by: Peter Zijlstra

Re: Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes)

2007-04-06 Thread Peter Zijlstra
On Fri, 2007-04-06 at 14:02 +0100, Hugh Dickins wrote: On Fri, 6 Apr 2007, Peter Zijlstra wrote: some thoughts on shared futexes; Could we get rid of the mmap_sem on the shared futexes in the following manner: - do a page table walk to find the pte; (walk meaning descent down

Re: Shared futexes (was [PATCH] FUTEX : new PRIVATE futexes)

2007-04-06 Thread Peter Zijlstra
On Fri, 2007-04-06 at 23:15 +1000, Nick Piggin wrote: Hugh Dickins wrote: On Fri, 6 Apr 2007, Peter Zijlstra wrote: some thoughts on shared futexes; Could we get rid of the mmap_sem on the shared futexes in the following manner: I'd imagine shared futexes would be much less common

Re: [PATCH 0/8] RSS controller based on process containers (v2)

2007-04-09 Thread Peter Zijlstra
*ugh* /me no like. The basic premises seems to be that we can track page owners perfectly (although this patch set does not yet do so), through get/release operations (on _mapcount). This is simply not true for unmapped pagecache pages. Those receive no 'release' event; (the usage by

Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

2007-04-11 Thread Peter Zijlstra
On Wed, 2007-04-11 at 02:53 -0700, Paul Jackson wrote: I'm confused - which end of ths stack is up? cpuset_exit doesn't call do_exit, rather it's the other way around. But put_files_struct doesn't call do_exit, rather do_exit calls __exit_files calls put_files_struct. I'm guessing its

  1   2   3   4   5   6   7   8   9   10   >