Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Nick Piggin
On Tuesday 26 February 2008 18:59, Jamie Lokier wrote: Andrew Morton wrote: On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier [EMAIL PROTECTED] wrote: (It would be nicer if sync_file_range() took a vector of ranges for better elevator scheduling, but let's ignore that :-) Two

Re: [PATCH] [0/18] Implement some low hanging BKL removal fruit in fs/*

2008-01-27 Thread Nick Piggin
access to metadata. Switch to a per-superblock mutex. Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/fs/minix/bitmap.c === --- linux-2.6.orig/fs/minix/bitmap.c +++ linux-2.6/fs/minix/bitmap.c @@ -69,11 +69,11 @@ void

Re: [PATCH][RFC] fast file mapping for loop

2008-01-09 Thread Nick Piggin
On Wednesday 09 January 2008 19:52, Jens Axboe wrote: So how does it work? Instead of punting IO to a thread and passing it through the page cache, we instead attempt to send the IO directly to the filesystem block that it maps to. You told Christoph that just using direct-IO from kernel

Re: [PATCH 36/42] VFS: export drop_pagecache_sb

2007-12-13 Thread Nick Piggin
On Friday 14 December 2007 02:24, Erez Zadok wrote: In message [EMAIL PROTECTED], Nick Piggin writes: On Monday 10 December 2007 13:42, Erez Zadok wrote: Needed to maintain cache coherency after branch management. Hmm, I'd much prefer to be able to sleep in invalidate_mapping_pages

Re: [patch] mm: fix XIP file writes

2007-12-11 Thread Nick Piggin
to reproduce this problem on 2.6.23 and service levels. As the testcase did not trigger, I looked into the 2.6.23 code. This problem was introduced by commit 4a9e5ef1f4f15205e477817a5cefc34bd3f65f55 (mm: write iovec cleanup from Nick Piggin) during 2.6.24-rc: snip--- - copied

Re: [PATCH 36/42] VFS: export drop_pagecache_sb

2007-12-11 Thread Nick Piggin
On Monday 10 December 2007 13:42, Erez Zadok wrote: Needed to maintain cache coherency after branch management. Hmm, I'd much prefer to be able to sleep in invalidate_mapping_pages before this function gets exported. As it is, it can cause massive latencies on preemption and the inode_lock so

Re: [patch] ext2: xip check fix

2007-12-06 Thread Nick Piggin
On Thu, Dec 06, 2007 at 09:43:27AM +0100, Carsten Otte wrote: Nick Piggin wrote: Xip does only work, if both do match PAGE_SIZE because it does'nt support multiple calls to direct_access in the get_xip_page address space operation. Thus we check both here, actually this was changed from

Re: [patch] ext2: xip check fix

2007-12-06 Thread Nick Piggin
On Thu, Dec 06, 2007 at 10:59:02AM +0100, Carsten Otte wrote: Nick Piggin wrote: After my patch, we can do XIP in a hardsect size PAGE_SIZE block device -- this seems to be a fine thing to do at least for the ramdisk code. Would this situation be problematic for existing drivers, and if so

Re: [patch] ext2: xip check fix

2007-12-06 Thread Nick Piggin
On Thu, Dec 06, 2007 at 10:17:39PM -0600, Rob Landley wrote: On Thursday 06 December 2007 21:22:25 Jared Hulbert wrote: I have'nt looked at it yet. I do appreciate it, I think it might broaden the user-base of this feature which is up to now s390 only due to the fact that the flash

Re: [patch] rewrite rd

2007-12-04 Thread Nick Piggin
On Tue, Dec 04, 2007 at 01:55:17AM -0600, Rob Landley wrote: On Monday 03 December 2007 22:26:28 Nick Piggin wrote: There is one slight downside -- direct block device access and filesystem metadata access goes through an extra copy and gets stored in RAM twice. However, this downside

Re: [patch] rewrite rd

2007-12-04 Thread Nick Piggin
On Tue, Dec 04, 2007 at 10:54:51AM +0100, Christian Borntraeger wrote: Am Dienstag, 4. Dezember 2007 schrieb Nick Piggin: [...] There is one slight downside -- direct block device access and filesystem metadata access goes through an extra copy and gets stored in RAM twice. However

[patch] rd: support XIP

2007-12-04 Thread Nick Piggin
On Tue, Dec 04, 2007 at 11:10:09AM +0100, Nick Piggin wrote: This is just an idea, I dont know if it is worth the trouble, but have you though about implementing direct_access for brd? That would allow execute-in-place (xip) on brd eliminating the extra copy. Actually that's a pretty

[patch] ext2: xip check fix

2007-12-04 Thread Nick Piggin
Am I missing something here? I wonder how s390 works without this change? -- ext2 should not worry about checking sb-s_blocksize for XIP before the sb's blocksize actually gets set. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/fs/ext2/super.c

Re: [patch] rd: support XIP

2007-12-04 Thread Nick Piggin
On Tue, Dec 04, 2007 at 03:26:20AM -0800, Andrew Morton wrote: On Tue, 4 Dec 2007 12:21:00 +0100 Nick Piggin [EMAIL PROTECTED] wrote: +* +* Cannot support XIP and highmem, because our -direct_access +* routine for XIP must return memory that is always addressable

[patch] mm: fix XIP file writes

2007-12-04 Thread Nick Piggin
On Tue, Dec 04, 2007 at 12:35:49PM +0100, Nick Piggin wrote: On Tue, Dec 04, 2007 at 03:26:20AM -0800, Andrew Morton wrote: On Tue, 4 Dec 2007 12:21:00 +0100 Nick Piggin [EMAIL PROTECTED] wrote: + * + * Cannot support XIP and highmem, because our -direct_access + * routine

[patch] rd: support XIP (updated)

2007-12-04 Thread Nick Piggin
On Tue, Dec 04, 2007 at 12:06:23PM +, Duane Griffin wrote: On 04/12/2007, Nick Piggin [EMAIL PROTECTED] wrote: + gfp_flags = GFP_NOIO | __GFP_ZERO; +#ifndef CONFIG_BLK_DEV_XIP + gfp_flags |= __GFP_HIGHMEM; +#endif page = alloc_page(GFP_NOIO | __GFP_HIGHMEM

[patch] rewrite rd

2007-12-03 Thread Nick Piggin
it is no longer part of the ramdisk code). - Boot / load time flexible ramdisk size, which could easily be extended to a per-ramdisk runtime changeable size (eg. with an ioctl). Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- MAINTAINERS|5 drivers/block/Kconfig | 12 - drivers

Re: [patch] rewrite rd

2007-12-03 Thread Nick Piggin
On Mon, Dec 03, 2007 at 10:29:03PM -0800, Andrew Morton wrote: On Tue, 4 Dec 2007 05:26:28 +0100 Nick Piggin [EMAIL PROTECTED] wrote: There is one slight downside -- direct block device access and filesystem metadata access goes through an extra copy and gets stored in RAM twice. However

Re: [patch] rewrite rd

2007-12-03 Thread Nick Piggin
On Tue, Dec 04, 2007 at 08:01:31AM +0100, Nick Piggin wrote: Thanks for the review, I'll post an incremental patch in a sec. Index: linux-2.6/drivers/block/brd.c === --- linux-2.6.orig/drivers/block/brd.c +++ linux-2.6/drivers

[rfc][patch 2/2] inotify: remove debug code

2007-12-02 Thread Nick Piggin
problems anyway. So remove it for now. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/fs/dcache.c === --- linux-2.6.orig/fs/dcache.c +++ linux-2.6/fs/dcache.c @@ -1408,9 +1408,6 @@ void d_delete(struct dentry * dentry

[rfc][patch 1/2] inotify: fix race

2007-12-02 Thread Nick Piggin
. Locking is taken care of, because both set_dentry_child_flags and inotify_d_instantiate hold dcache_lock and child-d_locks. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/fs/inotify.c === --- linux-2.6.orig/fs

Re: Should PAGE_CACHE_SIZE be discarded?

2007-11-15 Thread Nick Piggin
On Thu, Nov 15, 2007 at 02:46:46PM +, David Howells wrote: Benny Halevy [EMAIL PROTECTED] wrote: I think that what Nick was trying to say is that PAGE_CACHE_SIZE should always be used properly as the size of the memory struct Page covers (while PAGE_SIZE is the hardware page size and

Re: [rfc][patch 3/5] afs: new aops

2007-11-15 Thread Nick Piggin
On Thu, Nov 15, 2007 at 12:15:41PM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: So you're saying a struct page controls an area of PAGE_CACHE_SIZE, not an area of PAGE_SIZE? No, a pagecache page is PAGE_CACHE_SIZE. That doesn't answer my question. I didn't ask

Re: [rfc][patch 3/5] afs: new aops

2007-11-14 Thread Nick Piggin
On Wed, Nov 14, 2007 at 12:18:43PM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: The problem is that the code called assumes that the struct page * argument points to a single page, not an array of pages as would presumably be the case if PAGE_CACHE_SIZE PAGE_SIZE

Re: Should PAGE_CACHE_SIZE be discarded?

2007-11-14 Thread Nick Piggin
On Wed, Nov 14, 2007 at 01:56:53PM +, David Howells wrote: Are we ever going to have PAGE_CACHE_SIZE != PAGE_SIZE? If not, why not discard PAGE_CACHE_SIZE as it's then redundant. Christoph Lameter has patches exactly to make PAGE_CACHE_SIZE larger than PAGE_SIZE, and they seem to work

Re: [rfc][patch 3/5] afs: new aops

2007-11-14 Thread Nick Piggin
On Wed, Nov 14, 2007 at 03:57:46PM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: In core code, the PAGE_CACHE_SIZE is for page cache struct pages. Single struct pages (not page arrays). Take a look at generic mapping read or something. So you're saying a struct page

Re: Should PAGE_CACHE_SIZE be discarded?

2007-11-14 Thread Nick Piggin
On Wed, Nov 14, 2007 at 03:59:39PM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: Christoph Lameter has patches exactly to make PAGE_CACHE_SIZE larger than PAGE_SIZE, and they seem to work without much effort. I happen to hate the patches ;) but that doesn't change

Re: [PATCH 3/3] nfs: use -mmap_prepare() to avoid an AB-BA deadlock

2007-11-14 Thread Nick Piggin
On Wed, Nov 14, 2007 at 09:01:39PM +0100, Peter Zijlstra wrote: Normal locking order is: i_mutex mmap_sem However NFS's -mmap hook, which is called under mmap_sem, can take i_mutex. Avoid this potential deadlock by doing the work that requires i_mutex from the new -mmap_prepare().

Re: [PATCH 3/3] nfs: use -mmap_prepare() to avoid an AB-BA deadlock

2007-11-14 Thread Nick Piggin
On Wed, Nov 14, 2007 at 05:18:50PM -0500, Trond Myklebust wrote: On Wed, 2007-11-14 at 22:50 +0100, Peter Zijlstra wrote: Right, but I guess what Nick asked is, if pages could be stale to start with, how is that avoided in the future. The way I understand it, this re-validate is just a

Re: [rfc][patch 3/5] afs: new aops

2007-11-13 Thread Nick Piggin
On Tue, Nov 13, 2007 at 10:56:25AM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: It takes a pagecache page, yes. If you follow convention, you use PAGE_CACHE_SIZE for that guy. You don't have to allow PAGE_CACHE_SIZE != PAGE_SIZE, and if all the rest of your code

Re: [rfc][patch 3/5] afs: new aops

2007-11-12 Thread Nick Piggin
On Mon, Nov 12, 2007 at 03:29:14PM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: - ASSERTCMP(start + len, =, PAGE_SIZE); + ASSERTCMP(len, =, PAGE_CACHE_SIZE); Do you guarantee this will work if PAGE_CACHE_SIZE != PAGE_SIZE? If not, you can't make this particular

Re: [rfc][patch 3/5] afs: new aops

2007-11-12 Thread Nick Piggin
On Tue, Nov 13, 2007 at 12:30:05AM +, David Howells wrote: Nick Piggin [EMAIL PROTECTED] wrote: PAGE_CACHE_SIZE should be used to address the pagecache. Perhaps, but the function being called from there takes pages not page cache slots. If I have to allow for PAGE_CACHE_SIZE

[rfc][patches] remove -prepare_write

2007-11-11 Thread Nick Piggin
Hi, These are a set of patches to convert the last few filesystems to use the new deadlock-free write aops, and remove the core code to handle the legacy write path. I don't really have setups to sufficiently test these filesystems. So I would really appreciate if filesystem maintainers can pick

[rfc][patch 1/5] ecryptfs new aops

2007-11-11 Thread Nick Piggin
Convert ecryptfs to new aops. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/fs/ecryptfs/mmap.c === --- linux-2.6.orig/fs/ecryptfs/mmap.c +++ linux-2.6/fs/ecryptfs/mmap.c @@ -263,31 +263,38 @@ out: return

[rfc][patch 2/5] cifs: new aops

2007-11-11 Thread Nick Piggin
to writeback mode in the case that the full page was dirtied. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/fs/cifs/file.c === --- linux-2.6.orig/fs/cifs/file.c +++ linux-2.6/fs/cifs/file.c @@ -103,7 +103,7

[rfc][patch 3/5] afs: new aops

2007-11-11 Thread Nick Piggin
Convert afs to new aops. Cannot assume writes will fully complete, so this conversion goes the easy way and always brings the page uptodate before the write. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/fs/afs/file.c

[rfc][patch 4/5] rd: rewrite rd

2007-11-11 Thread Nick Piggin
reclaim buffer heads. The fact that it now goes through all the regular vm/fs paths makes it much more useful for testing, too. Signed-off-by: Nick Piggin [EMAIL PROTECTED] --- Index: linux-2.6/drivers/block/Kconfig === --- linux-2.6

[rfc][patch 5/5] remove prepare_write

2007-11-11 Thread Nick Piggin
Index: linux-2.6/drivers/block/loop.c === --- linux-2.6.orig/drivers/block/loop.c +++ linux-2.6/drivers/block/loop.c @@ -40,8 +40,7 @@ * Heinz Mauelshagen [EMAIL PROTECTED], Feb 2002 * * Support for falling back on the write

Re: [patch] fs: restore nobh

2007-10-25 Thread Nick Piggin
On Thu, Oct 25, 2007 at 09:07:36PM +0200, Jan Kara wrote: Hi, This is overdue, sorry. Got a little complicated, and I've been away from my filesystem test setup so I didn't want ot send it (lucky, coz I found a bug after more substantial testing). Anyway, RFC? Hmm, maybe one

Re: [PATCH 00/31] Remove iget() and read_inode() [try #4]

2007-10-12 Thread Nick Piggin
On Friday 12 October 2007 19:07, David Howells wrote: Hi Linus, Here's a set of patches that remove all calls to iget() and all read_inode() functions. They should be removed for two reasons: firstly they don't lend themselves to good error handling, and secondly their presence is a

[patch] fs: restore nobh

2007-10-08 Thread Nick Piggin
Hi, This is overdue, sorry. Got a little complicated, and I've been away from my filesystem test setup so I didn't want ot send it (lucky, coz I found a bug after more substantial testing). Anyway, RFC? --- Implement nobh in new aops. This is a bit tricky. FWIW, nobh_truncate is now implemented

Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Nick Piggin
for the time being, but as a trivial fix, I think this probably should go into 2.6.23. Thanks for spotting this problem Acked-by: Nick Piggin [EMAIL PROTECTED] I hope Nick or Miklos is clearer on what the risks are. (Apologies for all the nots and nons here, I'm embarrassed after just criticizing

Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 03:04, Andrew Morton wrote: On Mon, 8 Oct 2007 19:45:08 +0800 Yan Zheng [EMAIL PROTECTED] wrote: Hi all The test for VM_CAN_NONLINEAR always fails Signed-off-by: Yan Zheng[EMAIL PROTECTED] diff -ur linux-2.6.23-rc9/mm/fremap.c linux/mm/fremap.c ---

Re: [PATCH]fix VM_CAN_NONLINEAR check in sys_remap_file_pages

2007-10-08 Thread Nick Piggin
On Tuesday 09 October 2007 03:51, Andrew Morton wrote: On Mon, 8 Oct 2007 10:28:43 -0700 I'll now add remap_file_pages soon. Maybe those other 2 tests aren't strong enough (?). Or maybe they don't return a non-0 exit status even when they fail... (I'll check.) Perhaps Yan Zheng can

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-10-02 Thread Nick Piggin
On Tuesday 02 October 2007 07:01, Christoph Lameter wrote: On Sat, 29 Sep 2007, Peter Zijlstra wrote: On Fri, 2007-09-28 at 11:20 -0700, Christoph Lameter wrote: Really? That means we can no longer even allocate stacks for forking. I think I'm running with 4k stacks... 4k stacks will

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-30 Thread Nick Piggin
On Sunday 30 September 2007 05:20, Andrew Morton wrote: On Sat, 29 Sep 2007 06:19:33 +1000 Nick Piggin [EMAIL PROTECTED] wrote: On Saturday 29 September 2007 19:27, Andrew Morton wrote: On Sat, 29 Sep 2007 11:14:02 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: oom-killings

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-30 Thread Nick Piggin
On Monday 01 October 2007 06:12, Andrew Morton wrote: On Sun, 30 Sep 2007 05:09:28 +1000 Nick Piggin [EMAIL PROTECTED] wrote: On Sunday 30 September 2007 05:20, Andrew Morton wrote: We can't run out of unfragmented memory for an order-2 GFP_KERNEL allocation in this workload. We go

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-29 Thread Nick Piggin
On Saturday 29 September 2007 19:27, Andrew Morton wrote: On Sat, 29 Sep 2007 11:14:02 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: oom-killings, or page allocation failures? The latter, one hopes. Linux version 2.6.23-rc4-mm1-dirty ([EMAIL PROTECTED]) (gcc version 4.1.2 (Ubuntu

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-29 Thread Nick Piggin
On Saturday 29 September 2007 04:41, Christoph Lameter wrote: On Fri, 28 Sep 2007, Peter Zijlstra wrote: memory got massively fragemented, as anti-frag gets easily defeated. setting min_free_kbytes to 12M does seem to solve it - it forces 2 max order blocks to stay available, so we don't

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-28 Thread Nick Piggin
On Wednesday 19 September 2007 13:36, Christoph Lameter wrote: SLAB_VFALLBACK can be specified for selected slab caches. If fallback is available then the conservative settings for higher order allocations are overridden. We then request an order that can accomodate at mininum 100 objects. The

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-28 Thread Nick Piggin
On Thursday 20 September 2007 11:38, David Chinner wrote: On Wed, Sep 19, 2007 at 04:04:30PM +0200, Andrea Arcangeli wrote: Plus of course you don't like fsblock because it requires work to adapt a fs to it, I can't argue about that. No, I don't like fsblock because it is inherently a

Re: [15/17] SLUB: Support virtual fallback via SLAB_VFALLBACK

2007-09-28 Thread Nick Piggin
On Saturday 29 September 2007 03:33, Christoph Lameter wrote: On Fri, 28 Sep 2007, Nick Piggin wrote: On Wednesday 19 September 2007 13:36, Christoph Lameter wrote: SLAB_VFALLBACK can be specified for selected slab caches. If fallback is available then the conservative settings for higher

Re: [13/17] Virtual compound page freeing in interrupt context

2007-09-20 Thread Nick Piggin
On Wednesday 19 September 2007 13:36, Christoph Lameter wrote: If we are in an interrupt context then simply defer the free via a workqueue. In an interrupt context it is not possible to use vmalloc_addr() to determine the vmalloc address. So add a variant that does that too. Removing a

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-19 Thread Nick Piggin
On Wednesday 19 September 2007 04:30, Linus Torvalds wrote: On Tue, 18 Sep 2007, Nick Piggin wrote: ROFL! Yeah of course, how could I have forgotten about our trusty OOM killer as the solution to the fragmentation problem? It would only have been funnier if you had said to reboot every so

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:00, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: I don't know how it would prevent fragmentation from building up anyway. It's commonly the case that potentially unmovable objects are allowed to fill up all of ram (dentries, inodes, etc

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:21, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: So if you argue that vmap is a downside, then please tell me how you consider the -ENOMEM of your approach to be better? That is again pretty undifferentiated. Are we talking about

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-18 Thread Nick Piggin
On Tuesday 18 September 2007 08:05, Christoph Lameter wrote: On Sun, 16 Sep 2007, Nick Piggin wrote: fsblock doesn't need any of those hacks, of course. Nor does mine for the low orders that we are considering. For order MAX_ORDER this is unavoidable since the page allocator cannot

Re: 2.6.22.6: kernel BUG at fs/locks.c:171

2007-09-17 Thread Nick Piggin
On Saturday 15 September 2007 20:22, Soeren Sonnenburg wrote: On Sat, 2007-09-15 at 09:47 +, Soeren Sonnenburg wrote: Memtest did not find anything after 16 passes so I finally stopped it applied your patch and used CONFIG_DEBUG_SLAB=y CONFIG_DEBUG_SLAB_LEAK=y and booted into

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Saturday 15 September 2007 03:52, Christoph Lameter wrote: On Fri, 14 Sep 2007, Nick Piggin wrote: [*] ok, this isn't quite true because if you can actually put a hard limit on unmovable allocations then anti-frag will fundamentally help -- get back to me on that when you get

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Saturday 15 September 2007 04:08, Christoph Lameter wrote: On Fri, 14 Sep 2007, Nick Piggin wrote: However fsblock can do everything that higher order pagecache can do in terms of avoiding vmap and giving contiguous memory to block devices by opportunistically allocating higher orders

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Monday 17 September 2007 04:13, Mel Gorman wrote: On (15/09/07 14:14), Goswin von Brederlow didst pronounce: I keep coming back to the fact that movable objects should be moved out of the way for unmovable ones. Anything else just allows fragmentation to build up. This is easily

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-17 Thread Nick Piggin
On Monday 17 September 2007 14:07, David Chinner wrote: On Fri, Sep 14, 2007 at 06:48:55AM +1000, Nick Piggin wrote: OK, the vunmap batching code wipes your TLB flushing and IPIs off the table. Diffstat below, but the TLB portions are here (besides that _everything_ is probably lower due

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Nick Piggin
On Thursday 13 September 2007 09:06, Christoph Lameter wrote: On Wed, 12 Sep 2007, Nick Piggin wrote: So lumpy reclaim does not change my formula nor significantly help against a fragmentation attack. AFAIKS. Lumpy reclaim improves the situation significantly because the overwhelming

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Nick Piggin
On Thursday 13 September 2007 12:01, Nick Piggin wrote: On Thursday 13 September 2007 23:03, David Chinner wrote: Then just do operations on directories with lots of files in them (tens of thousands). Every directory operation will require at least one vmap in this situation - e.g

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-14 Thread Nick Piggin
On Thursday 13 September 2007 09:17, Christoph Lameter wrote: On Wed, 12 Sep 2007, Nick Piggin wrote: I will still argue that my approach is the better technical solution for large block support than yours, I don't think we made progress on that. And I'm quite sure we agreed at the VM

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 11:49, David Chinner wrote: On Wed, Sep 12, 2007 at 01:27:33AM +1000, Nick Piggin wrote: I just gave 4 things which combined might easily reduce xfs vmap overhead by several orders of magnitude, all without changing much code at all. Patches would be greatly

Re: 2.6.22.6: kernel BUG at fs/locks.c:171

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 19:20, Soeren Sonnenburg wrote: Dear all, I've just seen this in dmesg on a AMD K7 / kernel 2.6.22.6 machine (config attached). Any ideas / which further information needed ? Thanks for the report. Is it reproduceable? It seems like the locks_free_lock call

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-13 Thread Nick Piggin
On Thursday 13 September 2007 23:03, David Chinner wrote: On Thu, Sep 13, 2007 at 03:23:21AM +1000, Nick Piggin wrote: Well, it may not be easy to _fix_, but it's easy to try a few improvements ;) How do I make an image and run a workload that will coerce XFS into doing a significant

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Nick Piggin
On Wednesday 12 September 2007 11:49, David Chinner wrote: On Tue, Sep 11, 2007 at 04:00:17PM +1000, Nick Piggin wrote: OTOH, I'm not sure how much buy-in there was from the filesystems guys. Particularly Christoph H and XFS (which is strange because they already do vmapping in places

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Nick Piggin
On Wednesday 12 September 2007 11:49, David Chinner wrote: On Tue, Sep 11, 2007 at 04:00:17PM +1000, Nick Piggin wrote: OTOH, I'm not sure how much buy-in there was from the filesystems guys. Particularly Christoph H and XFS (which is strange because they already do vmapping in places

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-12 Thread Nick Piggin
On Wednesday 12 September 2007 10:00, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: Yes. I think we differ on our interpretations of okay. In my interpretation, it is not OK to use this patch as a way to solve VM or FS or IO scalability issues, especially not while

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Tuesday 11 September 2007 16:03, Christoph Lameter wrote: 5. VM scalability Large block sizes mean less state keeping for the information being transferred. For a 1TB file one needs to handle 256 million page structs in the VM if one uses 4k page size. A 64k page size reduces

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Tuesday 11 September 2007 22:12, Jörn Engel wrote: On Tue, 11 September 2007 04:52:19 +1000, Nick Piggin wrote: On Tuesday 11 September 2007 16:03, Christoph Lameter wrote: 5. VM scalability Large block sizes mean less state keeping for the information being transferred

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 04:31, Mel Gorman wrote: On Tue, 2007-09-11 at 18:47 +0200, Andrea Arcangeli wrote: Hi Mel, Hi, On Tue, Sep 11, 2007 at 04:36:07PM +0100, Mel Gorman wrote: that increasing the pagesize like what Andrea suggested would lead to internal fragmentation

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 04:25, Maxim Levitsky wrote: Hi, I think that fundamental problem is no fragmentation/large pages/... The problem is the VM itself. The vm doesn't use virtual memory, thats all, that the problem. Although this will be probably linux 3.0, I think that the

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 06:01, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: There is a limitation in the VM. Fragmentation. You keep saying this is a solved issue and just assuming you'll be able to fix any cases that come up as they happen. I still don't get

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 06:11, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: It would be interesting to craft an attack. If you knew roughly the layout and size of your dentry slab for example... maybe you could stat a whole lot of files, then open one and keep

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 06:01, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: There is a limitation in the VM. Fragmentation. You keep saying this is a solved issue and just assuming you'll be able to fix any cases that come up as they happen. I still don't get

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 06:53, Mel Gorman wrote: On (11/09/07 11:44), Nick Piggin didst pronounce: However, this discussion belongs more with the non-existant-remove-slab patch. Based on what we've seen since the summits, we need a thorough analysis with benchmarks before making

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 07:41, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: I think I would have as good a shot as any to write a fragmentation exploit, yes. I think I've given you enough info to do the same, so I'd like to hear a reason why it is not a problem

Re: [00/41] Large Blocksize Support V7 (adds memmap support)

2007-09-11 Thread Nick Piggin
On Wednesday 12 September 2007 07:48, Christoph Lameter wrote: On Tue, 11 Sep 2007, Nick Piggin wrote: But that's not my place to say, and I'm actually not arguing that high order pagecache does not have uses (especially as a practical, shorter-term solution which is unintrusive

Re: [07/36] Use page_cache_xxx in mm/filemap_xip.c

2007-08-28 Thread Nick Piggin
Christoph Hellwig wrote: On Tue, Aug 28, 2007 at 09:49:38PM +0200, J??rn Engel wrote: On Tue, 28 August 2007 12:05:58 -0700, [EMAIL PROTECTED] wrote: - index = *ppos PAGE_CACHE_SHIFT; - offset = *ppos ~PAGE_CACHE_MASK; + index = page_cache_index(mapping, *ppos); +

[rfc] block-based page writeout

2007-08-27 Thread Nick Piggin
Hi, I've always liked the idea of being able to do writeout directly based on block number, rather than the valiant but doomed-to-be-suboptimal heuristics that our current dirty writeout system does, as it is running above the pagecache and doesn't know about poor file layouts, or interaction

[patch] 2.6.23-rc3: fsblock

2007-08-24 Thread Nick Piggin
Hi, I'm still plugging away at fsblock slowly. Haven't really got around to to finishing up any big new features, but there has been a lot of bug fixing and little API changes since last release. I still think fsblock has merit, and even if a more extent-based approach ends up working better for

Re: [patch][rfc] fs: fix nobh error handling

2007-08-08 Thread Nick Piggin
On Wed, Aug 08, 2007 at 07:39:42AM -0700, Mingming Cao wrote: On Wed, 2007-08-08 at 08:07 -0500, Dave Kleikamp wrote: For jfs's sake, I don't really care if it ever uses nobh again. I originally started using it because I figured the movement was away from buffer heads and jfs seemed

[patch][rfc] fs: fix nobh error handling

2007-08-06 Thread Nick Piggin
that it can actually be written out correctly and be subject to the normal IO error handling paths. As an upshot, we save 1K of kernel stack on ia64 or powerpc 64K page systems. Signed-off-by: Nick Piggin [EMAIL PROTECTED] -- Index: linux-2.6/fs/buffer.c

Re: [PATCH RFC] extent mapped page cache

2007-07-26 Thread Nick Piggin
On Thu, Jul 26, 2007 at 09:05:15AM -0400, Chris Mason wrote: On Thu, 26 Jul 2007 04:36:39 +0200 Nick Piggin [EMAIL PROTECTED] wrote: [ are state trees a good idea? ] One thing it gains us is finding the start of the cluster. Even if called by kswapd, the state tree allows writepage

Re: [PATCH RFC] extent mapped page cache

2007-07-25 Thread Nick Piggin
On Wed, Jul 25, 2007 at 10:10:07PM -0400, Chris Mason wrote: On Thu, 26 Jul 2007 03:37:28 +0200 Nick Piggin [EMAIL PROTECTED] wrote: One advantage to the state tree is that it separates the state from the memory being described, allowing a simple kmap style interface that covers

Re: [PATCH RFC] extent mapped page cache

2007-07-25 Thread Nick Piggin
On Wed, Jul 25, 2007 at 08:18:53AM -0400, Chris Mason wrote: On Wed, 25 Jul 2007 04:32:17 +0200 Nick Piggin [EMAIL PROTECTED] wrote: Having another tree to store block state I think is a good idea as I said in the fsblock thread with Dave, but I haven't clicked as to why it is a big

[patch 1/4] ext2 convert to new aops fix

2007-07-24 Thread Nick Piggin
. Probably would hurt anyone in practice unless they are using 4GB directories, but must fix. Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/fs/ext2/dir.c === --- linux-2.6.orig/fs/ext2/dir.c +++ linux-2.6/fs/ext2/dir.c

[patch 2/4] minix convert to new aops fix

2007-07-24 Thread Nick Piggin
Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/fs/minix/dir.c === --- linux-2.6.orig/fs/minix/dir.c +++ linux-2.6/fs/minix/dir.c @@ -311,7 +311,7 @@ int minix_delete_entry(struct minix_dir_ struct

[patch 3/4] sysv convert to new aops fix

2007-07-24 Thread Nick Piggin
Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/fs/sysv/dir.c === --- linux-2.6.orig/fs/sysv/dir.c +++ linux-2.6/fs/sysv/dir.c @@ -219,7 +219,7 @@ int sysv_add_link(struct dentry *dentry, return -EINVAL

[patch 4/4] ufs convert to new aops fix

2007-07-24 Thread Nick Piggin
Signed-off-by: Nick Piggin [EMAIL PROTECTED] Index: linux-2.6/fs/ufs/dir.c === --- linux-2.6.orig/fs/ufs/dir.c +++ linux-2.6/fs/ufs/dir.c @@ -89,7 +89,7 @@ ino_t ufs_inode_by_name(struct inode *di void ufs_set_link(struct inode

Re: [PATCH RFC] extent mapped page cache

2007-07-24 Thread Nick Piggin
On Tue, Jul 24, 2007 at 07:25:09PM -0400, Chris Mason wrote: On Tue, 24 Jul 2007 23:25:43 +0200 Peter Zijlstra [EMAIL PROTECTED] wrote: The tree is a critical part of the patch, but it is also the easiest to rip out and replace. Basically the code stores a range by inserting an object at

Re: block_page_mkwrite? (Re: fault vs invalidate race (Re: -mm merge plans for 2.6.23))

2007-07-11 Thread Nick Piggin
David Chinner wrote: On Thu, Jul 12, 2007 at 10:54:57AM +1000, Nick Piggin wrote: Andrew Morton wrote: The fault-vs-invalidate race fix. I have belatedly learned that these need more work, so their state is uncertain. The more work may turn out being too much for you (although

Re: [RFC] fsblock

2007-07-09 Thread Nick Piggin
On Mon, Jul 09, 2007 at 10:14:06AM -0700, Christoph Lameter wrote: On Sun, 24 Jun 2007, Nick Piggin wrote: Firstly, what is the buffer layer? The buffer layer isn't really a buffer layer as in the buffer cache of unix: the block device cache is unified with the pagecache (in terms

Re: [RFC] fsblock

2007-07-09 Thread Nick Piggin
On Mon, Jul 09, 2007 at 05:59:47PM -0700, Christoph Lameter wrote: On Tue, 10 Jul 2007, Nick Piggin wrote: Hmmm I did not notice that yet but then I have not done much work there. Notice what? The bad code for the buffer heads. Oh. Well my first mail in this thrad listed

Re: vm/fs meetup details

2007-07-05 Thread Nick Piggin
On Thu, Jul 05, 2007 at 01:54:06PM -0400, Rik van Riel wrote: Nick Piggin wrote: Hi, The vm/fs meetup will be held September 4th from 10am till 4pm (with the option of going longer), at the University of Cambridge. I am interested. A few potential topics: OK, I'll put you on the list

Re: vm/fs meetup details

2007-07-05 Thread Nick Piggin
On Thu, Jul 05, 2007 at 05:40:57PM -0400, Rik van Riel wrote: David Chinner wrote: On Thu, Jul 05, 2007 at 01:40:08PM -0700, Zach Brown wrote: - repair driven design, we know what it is (Val told us), but how does it apply to the things we are currently working on? should we do more of it?

vm/fs meetup details

2007-07-04 Thread Nick Piggin
Hi, The vm/fs meetup will be held September 4th from 10am till 4pm (with the option of going longer), at the University of Cambridge. Anton Altaparmakov has arranged a conference room for us with whiteboard and projector, so many thanks to him. I will send out the location and plans for

  1   2   3   4   >