Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Andrew Morton wrote: On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier [EMAIL PROTECTED] wrote: (It would be nicer if sync_file_range() took a vector of ranges for better elevator scheduling, but let's ignore that :-) Two passes: Pass 1: shove each of the segments into the queue with

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Nick Piggin
On Tuesday 26 February 2008 18:59, Jamie Lokier wrote: Andrew Morton wrote: On Tue, 26 Feb 2008 07:26:50 + Jamie Lokier [EMAIL PROTECTED] wrote: (It would be nicer if sync_file_range() took a vector of ranges for better elevator scheduling, but let's ignore that :-) Two

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jeff Garzik wrote: [snip huge long proposal] Rather than invent new APIs, we should fix the existing ones to _really_ flush data to physical media. Btw, one reason for the length is the current block request API isn't sufficient even to make fsync() durable with _no_ new APIs. It offers

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread Daniel Phillips
I need to respond to this in pieces... first the bit that is bugging me: * two new page flags I need to keep track of two bits of per-cached-page information: (1) This page is known by the cache, and that the cache must be informed if the page is going to go away. I still do not

Re: [PATCH 22/28] mm: add support for non block device backed swap files

2008-02-26 Thread Miklos Szeredi
Starting review in the middle, because this is the part I'm most familiar with. New addres_space_operations methods are added: int swapfile(struct address_space *, int); Separate -swapon() and -swapoff() methods would be so much cleaner IMO. Also is there a reason why 'struct file *' cannot

Re: [PATCH 22/28] mm: add support for non block device backed swap files

2008-02-26 Thread Peter Zijlstra
On Tue, 2008-02-26 at 13:45 +0100, Miklos Szeredi wrote: Starting review in the middle, because this is the part I'm most familiar with. New addres_space_operations methods are added: int swapfile(struct address_space *, int); Separate -swapon() and -swapoff() methods would be so

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: Yeah, sync_file_range has slightly unusual semantics and introduce the new concept, writeout, to userspace (does writeout include in drive cache? the kernel doesn't think so, but the only way to make sync_file_range safe is if you

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread David Howells
Daniel Phillips [EMAIL PROTECTED] wrote: I need to respond to this in pieces... first the bit that is bugging me: * two new page flags I need to keep track of two bits of per-cached-page information: (1) This page is known by the cache, and that the cache must be informed if

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: Yeah, sync_file_range has slightly unusual semantics and introduce the new concept, writeout, to userspace (does writeout include in drive cache? the kernel doesn't think so, but the only way to make

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Ric Wheeler
Jeff Garzik wrote: Jamie Lokier wrote: By durable, I mean that fsync() should actually commit writes to physical stable storage, Yes, it should. I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: On Tue, 26 February 2008 20:16:11 +1100, Nick Piggin wrote: Yeah, sync_file_range has slightly unusual semantics and introduce the new concept, writeout, to userspace (does writeout include in drive cache? the kernel doesn't think so, but the only way to make

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Ric Wheeler wrote: I was surprised that fsync() doesn't do this already. There was a lot of effort put into block I/O write barriers during 2.5, so that journalling filesystems can force correct write ordering, using disk flush cache commands. After all that effort, I was very surprised to

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Andrew Morton
On Tue, 26 Feb 2008 15:07:45 + Jamie Lokier [EMAIL PROTECTED] wrote: SYNC_FILE_RANGE_WRITE scans all pages in the range, looking for dirty pages which aren't already queued for write-out. It marks those with a write-out flag, and starts write I/Os at some unspecified time in the near

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jeff Garzik
Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers. It seems far easier to make

Re: [RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Eric Sandeen
Takashi Sato wrote: o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS As Andreas Dilger and Christoph Hellwig advised me, I have elevated them to include/linux/fs.h as below. #define FIFREEZE_IOWR('X', 119, int)   #define FITHAW _IOWR('X',

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jeff Garzik wrote: Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support barriers. It

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 15:28:10 +, Jamie Lokier wrote: One interesting aspect of this comes with COW filesystems like btrfs or logfs. Writing out data pages is not sufficient, because those will get lost unless their referencing metadata is written as well. So either we have to

Re: [RFC] ext3 freeze feature ver 0.2

2008-02-26 Thread Andreas Dilger
On Feb 26, 2008 08:39 -0800, Eric Sandeen wrote: Takashi Sato wrote: o Elevate XFS ioctl numbers (XFS_IOC_FREEZE and XFS_IOC_THAW) to the VFS As Andreas Dilger and Christoph Hellwig advised me, I have elevated them to include/linux/fs.h as below. #define FIFREEZE

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jamie Lokier
Jörn Engel wrote: On Tue, 26 February 2008 15:28:10 +, Jamie Lokier wrote: One interesting aspect of this comes with COW filesystems like btrfs or logfs. Writing out data pages is not sufficient, because those will get lost unless their referencing metadata is written as well.

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jörn Engel
On Tue, 26 February 2008 17:29:13 +, Jamie Lokier wrote: You're right. Though, doesn't normal page writeback enqueue the COW metadata changes? If not, how do they get written in a timely fashion? It does. But this is not sufficient to guarantee that the pages in question have been

Re: Proposal for proper durable fsync() and fdatasync()

2008-02-26 Thread Jeff Garzik
Jamie Lokier wrote: Jeff Garzik wrote: Nick Piggin wrote: Anyway, the idea of making fsync/fdatasync etc. safe by default is a good idea IMO, and is a bad bug that we don't do that :( Agreed... it's also disappointing that [unless I'm mistaken] you have to hack each filesystem to support

Re: [PATCH 00/37] Permit filesystem local caching

2008-02-26 Thread Daniel Phillips
On Tuesday 26 February 2008 06:33, David Howells wrote: Suppose one were to take a mundane approach to the persistent cache problem instead of layering filesystems. What you would do then is change NFS's -write_page and variants to fiddle the persistent cache It is a requirement laid