Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-20 Thread Joerg Schilling
Peter Schuller peter.schul...@infidyne.com wrote: fsync() is, indeed, expensive. Lots of calls to fsync() that are not necessary for correct application operation EXCEPT as a workaround for lame filesystem re-ordering are a sure way to kill performance. IMO the fundamental problem is

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-19 Thread Bob Friesenhahn
On Thu, 19 Mar 2009, Miles Nordin wrote: And the guarantees ARE minimal---just: http://www.google.com/search?q=POSIX+%22crash+consistency%22 and you'll find even people against T'so's who want to change ext4 still agree POSIX is on T'so's side. Clearly I am guilty of inflated expectations.

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-19 Thread Miles Nordin
bf == Bob Friesenhahn bfrie...@simple.dallas.tx.us writes: bf If ZFS does try to order its disk updates in cronological bf order without prioritizing metadata updates over data, then bf the risk is minimized. AIUI it doesn't exactly order them, just puts them into 5-second chunks.

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-19 Thread Peter Schuller
fsync() is, indeed, expensive. Lots of calls to fsync() that are not necessary for correct application operation EXCEPT as a workaround for lame filesystem re-ordering are a sure way to kill performance. IMO the fundamental problem is that the only way to achieve a write barrier is fsync()

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-19 Thread Peter Schuller
Uh, I should probably clarify some things (I was too quick to hit send): IMO the fundamental problem is that the only way to achieve a write barrier is fsync() (disregarding direct I/O etc). Again I would just like an fbarrier() as I've mentioned on the list previously. It seems Of course if

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Casper . Dik
Recently there's been discussion [1] in the Linux community about how filesystems should deal with rename(2), particularly in the case of a crash. ext4 was found to truncate files after a crash, that had been written with open(foo.tmp), write(), close() and then rename(foo.tmp, foo). This is

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Joerg Schilling
James Andrewartha jam...@daa.com.au wrote: Recently there's been discussion [1] in the Linux community about how filesystems should deal with rename(2), particularly in the case of a crash. ext4 was found to truncate files after a crash, that had been written with open(foo.tmp), write(),

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Moore, Joe
Joerg Schilling wrote: James Andrewartha jam...@daa.com.au wrote: Recently there's been discussion [1] in the Linux community about how filesystems should deal with rename(2), particularly in the case of a crash. ext4 was found to truncate files after a crash, that had been written with

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Casper . Dik
AFAIUI, the ZFS transaction group maintains write ordering, at least as far as write()s to the fil e would be in the ZIL ahead of the rename() metadata updates. So I think the atomicity is maintained without requiring the application to call fsync() before cl osing the file. If the TXG is

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Bob Friesenhahn
On Wed, 18 Mar 2009, Joerg Schilling wrote: The problem in this case is not whether rename() is atomic but whether the file that replaces the old file in an atomic rename() operation is in a stable state on the disk before calling rename(). This topic is quite disturbing to me ... The

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread David Dyer-Bennet
On Wed, March 18, 2009 05:08, Joerg Schilling wrote: The problem in this case is not whether rename() is atomic but whether the file that replaces the old file in an atomic rename() operation is in a stable state on the disk before calling rename(). Good, I was hoping somebody saw it that

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Richard Elling
Bob Friesenhahn wrote: As it happens, current versions of my own application should be safe from this Linux filesystem bug, but older versions are not. There is even a way to request fsync() on every file close, but that could be quite expensive so it is not the default. Pragmatically, it

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Nicolas Williams
On Wed, Mar 18, 2009 at 11:15:48AM -0400, Moore, Joe wrote: Posix doesn't require the OS to sync() the file contents on close for local files like it does for NFS access? How odd. Why should it? If POSIX is agnostic as to system crashes / power failures, then why should it say anything about

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Nicolas Williams
On Wed, Mar 18, 2009 at 11:43:09AM -0500, Bob Friesenhahn wrote: In summary, I don't agree with you that the misbehavior is correct, but I do agree that copious expensive fsync()s should be assured to work around the problem. fsync() is, indeed, expensive. Lots of calls to fsync() that are

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Bob Friesenhahn
On Wed, 18 Mar 2009, Richard Elling wrote: Bob Friesenhahn wrote: As it happens, current versions of my own application should be safe from this Linux filesystem bug, but older versions are not. There is even a way to request fsync() on every file close, but that could be quite expensive so

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Miles Nordin
ja == James Andrewartha jam...@daa.com.au writes: ja other people are arguing that POSIX says rename(2) is atomic, Their statement is true but it's NOT an argument against T'so who is 100% right: the applications using that calling sequence for crash consistency are not portable under

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Miles Nordin
c == Miles Nordin car...@ivy.net writes: c fbarrier() on second thought that couldn't help this problem. The goal is to associate writing to the directory (rename) with writing to the file referenced by that inode/handle (write/fsync/``fbarrier''), and in POSIX these two things are pretty

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Casper . Dik
On Wed, Mar 18, 2009 at 11:43:09AM -0500, Bob Friesenhahn wrote: In summary, I don't agree with you that the misbehavior is correct, but I do agree that copious expensive fsync()s should be assured to work around the problem. fsync() is, indeed, expensive. Lots of calls to fsync() that are

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread David Dyer-Bennet
On Wed, March 18, 2009 11:43, Bob Friesenhahn wrote: On Wed, 18 Mar 2009, Joerg Schilling wrote: The problem in this case is not whether rename() is atomic but whether the file that replaces the old file in an atomic rename() operation is in a stable state on the disk before calling

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread David Dyer-Bennet
On Wed, March 18, 2009 11:59, Richard Elling wrote: Bob Friesenhahn wrote: As it happens, current versions of my own application should be safe from this Linux filesystem bug, but older versions are not. There is even a way to request fsync() on every file close, but that could be quite

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread David Magda
On Mar 18, 2009, at 12:43, Bob Friesenhahn wrote: POSIX does not care about disks or filesystems. The only correct behavior is for operations to be applied in the order that they are requested of the operating system. This is a core function of any operating system. It is therefore ok

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread James Litchfield
POSIX has a Synchronized I/O Data (and File) Integrity Completion definition (line 115434 of the Issue 7 (POSIX.1-2008) specification). What it says is that writes for a byte range in a file must complete before any pending reads for that byte range are satisfied. It does not say that if you

Re: [zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-18 Thread Miles Nordin
dm == David Magda dma...@ee.ryerson.ca writes: dm is this what POSIX actually specifies? i doubt it. If it did, it would basically mandate a log-structured / COW filesystem, which, although not a _bad_ idea, is way too far from a settled debate to be enshrining in a mandatory ``standard''

[zfs-discuss] rename(2), atomicity, crashes and fsync()

2009-03-17 Thread James Andrewartha
Hi all, Recently there's been discussion [1] in the Linux community about how filesystems should deal with rename(2), particularly in the case of a crash. ext4 was found to truncate files after a crash, that had been written with open(foo.tmp), write(), close() and then rename(foo.tmp, foo). This