Re: fsync()
On Jul 3, 2007, at 2:31 PM, Russell Coker wrote: On Wednesday 04 July 2007 01:38, Jeff Johnson <[EMAIL PROTECTED]> wrote: The primary goal of package management is to install files reliably, not push the progress bars faster. Of course we could have a mode of operation for the initial system install that doesn't do the fsync(). The only time a progress bar matters for RPM installation is the initial install. I'd rather have an enabler on the p[rogress bar than a disabler on fsync ;-) 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Wednesday 04 July 2007 01:38, Jeff Johnson <[EMAIL PROTECTED]> wrote: > The primary goal of package management > is to install files reliably, not push the progress bars faster. Of course we could have a mode of operation for the initial system install that doesn't do the fsync(). The only time a progress bar matters for RPM installation is the initial install. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jul 3, 2007, at 1:50 PM, Michael Schroeder wrote: On Tue, Jul 03, 2007 at 12:49:14PM -0400, Jeff Johnson wrote: On Jul 3, 2007, at 12:16 PM, Michael Schroeder wrote: No, as one needs a patch to Berkeleydb to make it support individual syncing. What's the patch? Overloading the sync vector or something deeper in db/*? Something deeper. (Hmm, didn't you have some good connections to the berkeley db folks? Maybe you can push the patch upstream...) Hehe. Yah, Keith Bostic put me up at his house in 1984 ;-) I'll try ... Yah, I remember this patch now. Hmmm, not syncing the mpool is likely asking for trouble though. I have seen rpmdb lossage if --rebuilddb is run with bad data in __db* files. 73 de Jeff --- db/db/db.c.orig 2004-11-11 15:58:46.0 + +++ db/db/db.c 2005-12-15 16:17:45.0 + @@ -591,6 +591,8 @@ __db_dbenv_mpool(dbp, fname, flags) (F_ISSET(dbp, DB_AM_NOT_DURABLE) ? DB_TXN_NOT_DURABLE : 0), 0, dbp->pgsize)) != 0) return (ret); + if (LF_ISSET(DB_NOFSYNC) && mpf->mfp) + F_SET(mpf->mfp, MP_NOFSYNC); return (0); } --- db/db/db_iface.c.orig 2004-10-16 01:31:54.0 + +++ db/db/db_iface.c2005-12-15 16:17:45.0 + @@ -1068,7 +1068,7 @@ __db_open_arg(dbp, txn, fname, dname, ty #defineOKFLAGS \ (DB_AUTO_COMMIT | DB_CREATE | DB_DIRTY_READ | DB_EXCL |\ DB_FCNTL_LOCKING | DB_NO_AUTO_COMMIT | DB_NOMMAP | DB_RDONLY |\ - DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN) + DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN | DB_NOFSYNC) if ((ret = __db_fchk(dbenv, "DB->open", flags, OKFLAGS)) != 0) return (ret); if (LF_ISSET(DB_EXCL) && !LF_ISSET(DB_CREATE)) --- db/dbinc/db.in.orig 2004-10-16 01:31:54.0 + +++ db/dbinc/db.in 2005-12-15 16:17:45.0 + @@ -260,6 +260,7 @@ struct __db_dbt { #define DB_FCNTL_LOCKING 0x0002000 /* UNDOC: fcntl(2) locking. */ #define DB_RDWRMASTER 0x0004000 /* UNDOC: allow subdb master open R/W */ #define DB_WRITEOPEN 0x0008000 /* UNDOC: open with write lock. */ +#defineDB_NOFSYNC0x001 /* UNDOC: don't fsync */ /* * Flags private to DB_ENV->txn_begin. --- db/dbinc/mp.h.orig 2004-10-16 01:31:54.0 + +++ db/dbinc/mp.h 2005-12-15 16:25:56.0 + @@ -309,6 +309,7 @@ struct __mpoolfile { #defineMP_FAKE_UOC 0x080 /* Unlink_on_close field: fake flag. */ #defineMP_NOT_DURABLE 0x100 /* File is not durable. */ #defineMP_TEMP 0x200 /* Backing file is a temporary. */ +#define MP_NOFSYNC 0x400 /* Don't fsync */ u_int32_t flags; }; --- db/mp/mp_sync.c.orig2004-11-11 15:58:48.0 + +++ db/mp/mp_sync.c 2005-12-15 16:23:57.0 + @@ -553,7 +553,7 @@ done: /* if (ret == 0 && (op == DB_SYNC_CACHE || op == DB_SYNC_FILE)) { if (dbmfp == NULL) ret = __memp_sync_files(dbenv, dbmp); - else + else if (!dbmfp->mfp || !F_ISSET(dbmfp->mfp, MP_NOFSYNC)) ret = __os_fsync(dbenv, dbmfp->fhp); } @@ -600,7 +600,7 @@ int __memp_sync_files(dbenv, dbmp) MUTEX_THREAD_LOCK(dbenv, dbmp->mutexp); for (dbmfp = TAILQ_FIRST(&dbmp->dbmfq); dbmfp != NULL; dbmfp = TAILQ_NEXT(dbmfp, q)) { - if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY)) + if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY | MP_NOFSYNC)) continue; ret = __os_fsync(dbenv, dbmfp->fhp); break; @@ -662,6 +662,9 @@ __memp_mf_sync(dbmp, mfp) dbenv = dbmp->dbenv; + if (F_ISSET(mfp, MP_NOFSYNC)) + return 0; + /* * Expects caller to be holding the region lock: we're using the path * name and __memp_nameop might try and rename the file. Cheers, Michael. -- Michael Schroeder [EMAIL PROTECTED] SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Tue, Jul 03, 2007 at 12:49:14PM -0400, Jeff Johnson wrote: > > On Jul 3, 2007, at 12:16 PM, Michael Schroeder wrote: > > > > >No, as one needs a patch to Berkeleydb to make it support > >individual syncing. > > > > What's the patch? Overloading the sync vector or something deeper in > db/*? Something deeper. (Hmm, didn't you have some good connections to the berkeley db folks? Maybe you can push the patch upstream...) --- db/db/db.c.orig 2004-11-11 15:58:46.0 + +++ db/db/db.c 2005-12-15 16:17:45.0 + @@ -591,6 +591,8 @@ __db_dbenv_mpool(dbp, fname, flags) (F_ISSET(dbp, DB_AM_NOT_DURABLE) ? DB_TXN_NOT_DURABLE : 0), 0, dbp->pgsize)) != 0) return (ret); + if (LF_ISSET(DB_NOFSYNC) && mpf->mfp) + F_SET(mpf->mfp, MP_NOFSYNC); return (0); } --- db/db/db_iface.c.orig 2004-10-16 01:31:54.0 + +++ db/db/db_iface.c2005-12-15 16:17:45.0 + @@ -1068,7 +1068,7 @@ __db_open_arg(dbp, txn, fname, dname, ty #defineOKFLAGS \ (DB_AUTO_COMMIT | DB_CREATE | DB_DIRTY_READ | DB_EXCL |\ DB_FCNTL_LOCKING | DB_NO_AUTO_COMMIT | DB_NOMMAP | DB_RDONLY |\ - DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN) + DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN | DB_NOFSYNC) if ((ret = __db_fchk(dbenv, "DB->open", flags, OKFLAGS)) != 0) return (ret); if (LF_ISSET(DB_EXCL) && !LF_ISSET(DB_CREATE)) --- db/dbinc/db.in.orig 2004-10-16 01:31:54.0 + +++ db/dbinc/db.in 2005-12-15 16:17:45.0 + @@ -260,6 +260,7 @@ struct __db_dbt { #defineDB_FCNTL_LOCKING 0x0002000 /* UNDOC: fcntl(2) locking. */ #defineDB_RDWRMASTER 0x0004000 /* UNDOC: allow subdb master open R/W */ #defineDB_WRITEOPEN 0x0008000 /* UNDOC: open with write lock. */ +#defineDB_NOFSYNC0x001 /* UNDOC: don't fsync */ /* * Flags private to DB_ENV->txn_begin. --- db/dbinc/mp.h.orig 2004-10-16 01:31:54.0 + +++ db/dbinc/mp.h 2005-12-15 16:25:56.0 + @@ -309,6 +309,7 @@ struct __mpoolfile { #defineMP_FAKE_UOC 0x080 /* Unlink_on_close field: fake flag. */ #defineMP_NOT_DURABLE 0x100 /* File is not durable. */ #defineMP_TEMP 0x200 /* Backing file is a temporary. */ +#define MP_NOFSYNC 0x400 /* Don't fsync */ u_int32_t flags; }; --- db/mp/mp_sync.c.orig2004-11-11 15:58:48.0 + +++ db/mp/mp_sync.c 2005-12-15 16:23:57.0 + @@ -553,7 +553,7 @@ done: /* if (ret == 0 && (op == DB_SYNC_CACHE || op == DB_SYNC_FILE)) { if (dbmfp == NULL) ret = __memp_sync_files(dbenv, dbmp); - else + else if (!dbmfp->mfp || !F_ISSET(dbmfp->mfp, MP_NOFSYNC)) ret = __os_fsync(dbenv, dbmfp->fhp); } @@ -600,7 +600,7 @@ int __memp_sync_files(dbenv, dbmp) MUTEX_THREAD_LOCK(dbenv, dbmp->mutexp); for (dbmfp = TAILQ_FIRST(&dbmp->dbmfq); dbmfp != NULL; dbmfp = TAILQ_NEXT(dbmfp, q)) { - if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY)) + if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY | MP_NOFSYNC)) continue; ret = __os_fsync(dbenv, dbmfp->fhp); break; @@ -662,6 +662,9 @@ __memp_mf_sync(dbmp, mfp) dbenv = dbmp->dbenv; + if (F_ISSET(mfp, MP_NOFSYNC)) + return 0; + /* * Expects caller to be holding the region lock: we're using the path * name and __memp_nameop might try and rename the file. Cheers, Michael. -- Michael Schroeder [EMAIL PROTECTED] SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jul 3, 2007, at 12:16 PM, Michael Schroeder wrote: No, as one needs a patch to Berkeleydb to make it support individual syncing. What's the patch? Overloading the sync vector or something deeper in db/*? 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Tue, Jul 03, 2007 at 11:38:15AM -0400, Jeff Johnson wrote: > We also disagree on the importance of attempting to sync > data to disk in spite of modest cost. I believe that rpm (and rpm5.org) > should sync to disk where appropriate rather than disabling rpmdb > sync's (and fsync for file contents here) as SuSE Just to clarify things: SUSE is just disabling the syncing of the index databases, the Packages database is still fsync()ed. Index databases can be easily regenerated by 'rpm --rebuilddb', but if Packages is corrupt you have hosed your system. > and perhaps rpm.org are choosing to do. No, as one needs a patch to Berkeleydb to make it support individual syncing. Cheers, Michael. -- Michael Schroeder [EMAIL PROTECTED] SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jul 3, 2007, at 10:55 AM, Michael Schroeder wrote: On Tue, Jul 03, 2007 at 10:39:10AM -0400, Jeff Johnson wrote: But this isn't every program, rpm is an installer, and installers are expected to try harder. Slowing down is likely unmeasureable, Not from my experience... We also disagree on the importance of attempting to sync data to disk in spite of modest cost. I believe that rpm (and rpm5.org) should sync to disk where appropriate rather than disabling rpmdb sync's (and fsync for file contents here) as SuSE and perhaps rpm.org are choosing to do. The primary goal of package management is to install files reliably, not push the progress bars faster. and can certainly be conditioned on file system type or configuration. I wouldn't mind a rpm macro check so that people can turn it on or off according to their needs. Will do. Since noone is meaningfully configuring rpm, the default (according to principle of least surprise) will be to always fsync. 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Tue, Jul 03, 2007 at 10:39:10AM -0400, Jeff Johnson wrote: > But this isn't every program, rpm is an installer, and installers are > expected to try harder. > > Slowing down is likely unmeasureable, Not from my experience... > and can certainly be conditioned > on file system type or configuration. I wouldn't mind a rpm macro check so that people can turn it on or off according to their needs. Thanks, Michael. -- Michael Schroeder [EMAIL PROTECTED] SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jul 3, 2007, at 6:19 AM, Michael Schroeder wrote: On Tue, Jul 03, 2007 at 08:10:04PM +1000, Russell Coker wrote: Do you consider systems that lose data, have internal databases that don't match their own state, and generally cause down-time for people to be correct? You might as well argue that *every* program that writes to disk should do fsync() calls. If you crash the system while rpm is running the system is broken nevertheless. Your fsyncs just make the critical window a bit smaller while slowing down rpm. But this isn't every program, rpm is an installer, and installers are expected to try harder. Slowing down is likely unmeasureable, and can certainly be conditioned on file system type or configuration. 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Tue, Jul 03, 2007 at 08:10:04PM +1000, Russell Coker wrote: > Do you consider systems that lose data, have internal databases that don't > match their own state, and generally cause down-time for people to be > correct? You might as well argue that *every* program that writes to disk should do fsync() calls. If you crash the system while rpm is running the system is broken nevertheless. Your fsyncs just make the critical window a bit smaller while slowing down rpm. Cheers, Michael. -- Michael Schroeder [EMAIL PROTECTED] SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Tuesday 03 July 2007 00:02, Michael Schroeder <[EMAIL PROTECTED]> wrote: > On Sun, Jul 01, 2007 at 08:02:40PM +1000, Russell Coker wrote: > > It's really a pity that people can't just write correct programs and that > > it needs to cost someone time and money and then more time fighting with > > the developers to get the fix applied. > > You have a very strange definition of "correct"... 1A) Write file A 1B) fsync() file A 2A) Write file B 2B) fsync() file B If a program does the above and an unexpected system reset occurs at any stage then you will either get nothing, file A corrupted, file A written and no change to file B, file A written and file B corrupted, or both files written. Therefore if file A must be written before file B (IE file B is the RPM database that says that file A was correctly written) then the system will always be in a correct state (if the file was not installed then the RPM database will know about this). My definition of "correct" has this guarantee. 1) Write file A 2) Write file B If a program does the above and an unexpected system reset occurs then you could have no change, both files corrupted, both files written, one file corrupted (either file), or one file written and one file corrupted. This means that you may have the system in an undefined state and have the RPM database not tell you (unless you check the file checksums) and is not what I consider correct behaviour. Do you consider systems that lose data, have internal databases that don't match their own state, and generally cause down-time for people to be correct? PS I first encountered this problem on a SLES10-SP1 system. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Sun, Jul 01, 2007 at 08:02:40PM +1000, Russell Coker wrote: > It's really a pity that people can't just write correct programs and that it > needs to cost someone time and money and then more time fighting with the > developers to get the fix applied. You have a very strange definition of "correct"... Michael. -- Michael Schroeder [EMAIL PROTECTED] SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);} __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
Russell Coker wrote: > On Saturday 30 June 2007 19:52, Andy Green <[EMAIL PROTECTED]> wrote: >> Russell Coker wrote: >>> On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote: I don't think fsync() for individual files is really a fair answer, >>> Why not? >> $ man fsync >> ... >> DESCRIPTION >>fsync() transfers ("flushes") all modified in-core data of >> (i.e., modified buffer cache pages for) the file referred to by the file >>descriptor fd to the disk device (or other permanent storage >> device) where that file resides. The call blocks until the device reports >>that the transfer has completed. It also flushes metadata >> information associated with the file (see stat(2)). >> ... >> >> You're proposing that doing an fsync() after every unpacked file is >> righteous for all cases? RPM will slow down dramatically for no real >> benefit. If power is lost partway through an archive unpack the package >> is still in an inconsistent partial state on the drive despite that the >> the atomic unit of inconsistency is supposedly now one file. One way or >> another you end up with half a kernel package or whatever. > > It is not required that you call fsync() on each file separately. You can > write data to a number of files (900 file descriptors is safe on all > platforms) and then loop through calling fsync() (or fdatasync()) on each Russell you care about having it way more than I care about not having it, Jeff seems to agree it is worth having, so I guess we will have it. You can always find a bad time to hit reset and get a partial package on your drive, but I certainly agree it is better if rpm exiting is a trustworthy signal that it is after the delicate moment. Why sync() somehow doesn't deliver that and fsync() does... 99% of C code is not "correct"... *shrug*. -Andy __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Saturday 30 June 2007 23:09, Andy Green <[EMAIL PROTECTED]> wrote: > >> sync() is not the way to get some files committed to disk. > > > > Sure it is. A sync() at the end is aimed at closing the window between > > One more point on fsync() vs sync()... sync() will mop up whatever the > %pre/%post scripts have been up to, eg, ldconfig or whatever and commit > it. fsync() just on archive unpacked files will miss that. The correct solution is to get the important programs to sync their own files via fsync(). A recent change to ldconfig made it use fsync() for correct operation (again due to me being at ground-zero when a machine was damaged). It's really a pity that people can't just write correct programs and that it needs to cost someone time and money and then more time fighting with the developers to get the fix applied. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Saturday 30 June 2007 19:52, Andy Green <[EMAIL PROTECTED]> wrote: > Russell Coker wrote: > > On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote: > >> I don't think fsync() for individual files is really a fair answer, > > > > Why not? > > $ man fsync > ... > DESCRIPTION >fsync() transfers ("flushes") all modified in-core data of > (i.e., modified buffer cache pages for) the file referred to by the file >descriptor fd to the disk device (or other permanent storage > device) where that file resides. The call blocks until the device reports >that the transfer has completed. It also flushes metadata > information associated with the file (see stat(2)). > ... > > You're proposing that doing an fsync() after every unpacked file is > righteous for all cases? RPM will slow down dramatically for no real > benefit. If power is lost partway through an archive unpack the package > is still in an inconsistent partial state on the drive despite that the > the atomic unit of inconsistency is supposedly now one file. One way or > another you end up with half a kernel package or whatever. It is not required that you call fsync() on each file separately. You can write data to a number of files (900 file descriptors is safe on all platforms) and then loop through calling fsync() (or fdatasync()) on each one. By the time you have written data to file 900 there is a reasonable chance that some of the data from file 1 has made it to disk. When you call fsync() on file 1 the filesystem driver may decide to sync the data for multiple files (there is nothing preventing a filesystem driver from writing more data to disk than is required). The real benefit of fsync() is that you don't get messed up machines. I started this discussion because I had a SUSE machine get corrupted files due to installing an RPM shortly before a reset. The RPM system didn't indicate any problem and none of the other people working on the project had the skills needed to diagnose the problem. > >> it's > >> fine if it just uses the normal filesystem APIs per-file. But after the > >> transaction is complete, and you walk away thinking you did complete an > >> rpm transaction, there is a case for adding a sync() to make sure > >> everything you think you have done is truly committed to physical > >> storage (maybe it does it already, I dunno). On the one hand this is a > >> relatively low probability issue for a desktop box but on the other hand > >> it is pretty cheap. > > > > The time taken for a sync() system call can be very large when you have a > > system under high write load. Under some older versions of Linux the > > time taken for sync() appeared to be unbounded (it apparently kept > > looping through the list of data to write while more data was being added > > to the list), a brief test suggests that recent versions of Linux may > > have solved this. > > Well then why mention this as an issue. Why mention what exactly? Why mention the need for fsync()? Because the lack of it results in damaged systems. Why mention the relative merits of sync() and fsync()? Because someone else advocated the use of sync(). > > sync() is not the way to get some files committed to disk. > > Sure it is. A sync() at the end is aimed at closing the window between > rpm completing a transaction (and feeding back that it is completed), From sync(2): BUGS According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guarantee data integrity: modern disks have large caches.) > and the completed actions not being on physical storage. With a single > sync() at the end you don't come back to the prompt from rpm until the > transaction is completed not only in cache but at the physical storage. Pity that sync() is not guaranteed to work that way. Note that we want RPM to work on OSs other than Linux too. > (In the case of HDDs neither fsync() nor sync() guarantee that the data > is committed from the HDD private cache to the nonvolitile storage, but > that should normally happen very shortly afterwards). In a correctly configured drive sub-system you will not have any write-back cache that is volatile and which will be considered as being stable for the purpose of fsync(). Lots of cheap hard drives do the wrong thing in this regard, if you have a SAS or SCSI drive or a correctly configured SATA or IDE drive then the right thing should be done. > Anyway I just mentioned it has value for embedded flash devices. I > don't know it or fsync() has much value for PCs. I've just had a SERVER become unusable because of this problem. I think it's more important for a server than for an embedded system. Embedded systems are rarely upgraded and generally only have RPMs installed in a factory.
Re: fsync()
On Saturday 30 June 2007 15:01, "Wichmann, Mats D" <[EMAIL PROTECTED]> wrote: > >> OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery > >> to me. > > > > If you want to know any specific things about XFS then ask me. > > I'm currently working for SGI in a group that is associated > > with the XFS group. I regularly talk to the XFS experts and > > can get any specific questions answered. > > This is ridiculous. > > Nobody should need to know anything about the internal > details of a specific filesystem's implementation when > they're writing a userspace application. If the filesystem > has chosen to do silly optimizations which doesn't work > right when using published apis and methods, it's the > filesystem that is broken and not even remotely the > application's fault. It's because of silly statements such as the above that I hesitate to mention that I'm using XFS in such discussions. ALL filesystems implement write-back caching for performance unless mounted with the "sync" option (which on Linux is documented as only working for ext2, ext3, and UFS). Also ext3 has the commit=X option where X will be the maximum number of seconds that the data will not be sync'd for (default 5 seconds), of course if during a 5 second period you dirty enough pages to need a minute to write them all back... Even 5 seconds is enough time to install a package on a fast machine and then have the machine rebooted thus resulting in an inconsistent state. You have to deal with the fact that there are many filesystems out there other than Ext2/3 and UFS, filesystems that are designed to give maximum performance and which actually implement the OS APIs for the purpose that they were intended. Here's the relevant section from write(2): NOTES A successful return from write() does not make any guarantee that data has been committed to disk. In fact, on some buggy implementations, it does not even guarantee that space has successfully been reserved for the data. The only way to be sure is to call fsync(2) after you are done writing all your data. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
Andy Green wrote: >> sync() is not the way to get some files committed to disk. > > Sure it is. A sync() at the end is aimed at closing the window between One more point on fsync() vs sync()... sync() will mop up whatever the %pre/%post scripts have been up to, eg, ldconfig or whatever and commit it. fsync() just on archive unpacked files will miss that. -Andy __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
Russell Coker wrote: > On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote: >> I don't think fsync() for individual files is really a fair answer, > > Why not? $ man fsync ... DESCRIPTION fsync() transfers ("flushes") all modified in-core data of (i.e., modified buffer cache pages for) the file referred to by the file descriptor fd to the disk device (or other permanent storage device) where that file resides. The call blocks until the device reports that the transfer has completed. It also flushes metadata information associated with the file (see stat(2)). ... You're proposing that doing an fsync() after every unpacked file is righteous for all cases? RPM will slow down dramatically for no real benefit. If power is lost partway through an archive unpack the package is still in an inconsistent partial state on the drive despite that the the atomic unit of inconsistency is supposedly now one file. One way or another you end up with half a kernel package or whatever. >> it's >> fine if it just uses the normal filesystem APIs per-file. But after the >> transaction is complete, and you walk away thinking you did complete an >> rpm transaction, there is a case for adding a sync() to make sure >> everything you think you have done is truly committed to physical >> storage (maybe it does it already, I dunno). On the one hand this is a >> relatively low probability issue for a desktop box but on the other hand >> it is pretty cheap. > > The time taken for a sync() system call can be very large when you have a > system under high write load. Under some older versions of Linux the time > taken for sync() appeared to be unbounded (it apparently kept looping through > the list of data to write while more data was being added to the list), a > brief test suggests that recent versions of Linux may have solved this. Well then why mention this as an issue. > sync() is not the way to get some files committed to disk. Sure it is. A sync() at the end is aimed at closing the window between rpm completing a transaction (and feeding back that it is completed), and the completed actions not being on physical storage. With a single sync() at the end you don't come back to the prompt from rpm until the transaction is completed not only in cache but at the physical storage. (In the case of HDDs neither fsync() nor sync() guarantee that the data is committed from the HDD private cache to the nonvolitile storage, but that should normally happen very shortly afterwards). Anyway I just mentioned it has value for embedded flash devices. I don't know it or fsync() has much value for PCs. -Andy __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
RE: fsync()
[EMAIL PROTECTED] wrote: > On Saturday 30 June 2007 12:02, Jeff Johnson <[EMAIL PROTECTED]> wrote: >> OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery >> to me. > > If you want to know any specific things about XFS then ask me. > I'm currently working for SGI in a group that is associated > with the XFS group. I regularly talk to the XFS experts and > can get any specific questions answered. This is ridiculous. Nobody should need to know anything about the internal details of a specific filesystem's implementation when they're writing a userspace application. If the filesystem has chosen to do silly optimizations which doesn't work right when using published apis and methods, it's the filesystem that is broken and not even remotely the application's fault. __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Saturday 30 June 2007 12:02, Jeff Johnson <[EMAIL PROTECTED]> wrote: > OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery > to me. If you want to know any specific things about XFS then ask me. I'm currently working for SGI in a group that is associated with the XFS group. I regularly talk to the XFS experts and can get any specific questions answered. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jun 29, 2007, at 6:04 PM, Russell Coker wrote: On Saturday 30 June 2007 02:04, Jeff Johnson <[EMAIL PROTECTED]> wrote: Should I add O_SYNC when opening files on delayed write file systems? Doable, but annoying mapping the path back to a file system type to infer functionality. O_SYNC is not the correct solution. XFS likes to delay block allocation to get contiguous files. O_SYNC on XFS would either result in re- allocating file blocks (terrible for write performance) or discontigous files. Write performance will always be expected to be better from a fsync() before close() than from O_SYNC. OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery to me. 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Saturday 30 June 2007 02:04, Jeff Johnson <[EMAIL PROTECTED]> wrote: > Should I add O_SYNC when opening files on delayed write file systems? > Doable, but annoying mapping the path back to a file system type to > infer functionality. O_SYNC is not the correct solution. XFS likes to delay block allocation to get contiguous files. O_SYNC on XFS would either result in re-allocating file blocks (terrible for write performance) or discontigous files. Write performance will always be expected to be better from a fsync() before close() than from O_SYNC. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote: > I don't think fsync() for individual files is really a fair answer, Why not? > it's > fine if it just uses the normal filesystem APIs per-file. But after the > transaction is complete, and you walk away thinking you did complete an > rpm transaction, there is a case for adding a sync() to make sure > everything you think you have done is truly committed to physical > storage (maybe it does it already, I dunno). On the one hand this is a > relatively low probability issue for a desktop box but on the other hand > it is pretty cheap. The time taken for a sync() system call can be very large when you have a system under high write load. Under some older versions of Linux the time taken for sync() appeared to be unbounded (it apparently kept looping through the list of data to write while more data was being added to the list), a brief test suggests that recent versions of Linux may have solved this. sync() is not the way to get some files committed to disk. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jun 29, 2007, at 9:51 AM, Russell Coker wrote: But adding an explicit fsync() is trivial as soon as I can get a reproducer. You want to be able to repeatably trigger a race-condition before you fix it? Nah, I just want to verify that I planted the sync/flush in the right place. To cause this race condition you must first use a file-system that is optimised for performance (EG XFS) so that it will allow long cache write-back times and also do write-related tasks after closing the file (EG assigning disk blocks to the file after close() so that it knows the length). Then put some load on the system while installing an RPM, and then trigger a hardware reset shortly after rpm exits. Should I add O_SYNC when opening files on delayed write file systems? Doable, but annoying mapping the path back to a file system type to infer functionality. 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
Russell Coker wrote: >> But adding an explicit fsync() is trivial as soon as I can get a >> reproducer. > > You want to be able to repeatably trigger a race-condition before you fix it? > > To cause this race condition you must first use a file-system that is > optimised for performance (EG XFS) so that it will allow long cache > write-back times and also do write-related tasks after closing the file (EG > assigning disk blocks to the file after close() so that it knows the length). > > Then put some load on the system while installing an RPM, and then trigger a > hardware reset shortly after rpm exits. I have some patches for busybox that allow rpm packages to be used there. Often with embedded it is flash behind the filesystem, so I make it call sync() on exit for just this scenario. I don't think fsync() for individual files is really a fair answer, it's fine if it just uses the normal filesystem APIs per-file. But after the transaction is complete, and you walk away thinking you did complete an rpm transaction, there is a case for adding a sync() to make sure everything you think you have done is truly committed to physical storage (maybe it does it already, I dunno). On the one hand this is a relatively low probability issue for a desktop box but on the other hand it is pretty cheap. -Andy __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Friday 29 June 2007 05:37, Jeff Johnson <[EMAIL PROTECTED]> wrote: > On Jun 28, 2007, at 2:28 AM, Russell Coker wrote: > > When upgrading a package with RPM version 4.4.2 in SUSE doesn't > > call fsync()! > > It creates a temporary file (without using O_SYNC), writes all the > > data to > > it, closes it, and then renames it to replace the original file. > > The temporary file has the /path/to/file;12345678 transaction id > appended? I don't recall the name. > A close should sync the data, should it not? Not necessarily. Some filesystems (such as XFS) try to deduce what a user-space program desires by the pattern of system-calls and implements it (EG a certain combination of create and rename can cause the data to be sync'd faster). > Or do you mean the rpmdb files? No, I mean file data. > SuSE has a very different usage case > for a rpmdb, and insists on avoiding sync whenever possible for > "performance" > reasons. Then SuSE are butt-heads. > > Has this horrible mistake been fixed in the upstream tree? > > I believe the problem is a change in behavior in libio in glibc. I believe that it has nothing to do with glibc or any other user-space code. > But adding an explicit fsync() is trivial as soon as I can get a > reproducer. You want to be able to repeatably trigger a race-condition before you fix it? To cause this race condition you must first use a file-system that is optimised for performance (EG XFS) so that it will allow long cache write-back times and also do write-related tasks after closing the file (EG assigning disk blocks to the file after close() so that it knows the length). Then put some load on the system while installing an RPM, and then trigger a hardware reset shortly after rpm exits. -- [EMAIL PROTECTED] http://etbe.coker.com.au/ My Blog http://www.coker.com.au/sponsorship.html Sponsoring Free Software development __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org
Re: fsync()
On Jun 28, 2007, at 2:28 AM, Russell Coker wrote: When upgrading a package with RPM version 4.4.2 in SUSE doesn't call fsync()! It creates a temporary file (without using O_SYNC), writes all the data to it, closes it, and then renames it to replace the original file. The temporary file has the /path/to/file;12345678 transaction id appended? A close should sync the data, should it not? I've a dim memory of a problem with sync on close using libio that I had to fix 6 months ago with O_RDONLY opens (only O_RDWR need be fflush'd and presumably sync'd. Or do you mean the rpmdb files? SuSE has a very different usage case for a rpmdb, and insists on avoiding sync whenever possible for "performance" reasons. Has this horrible mistake been fixed in the upstream tree? I believe the problem is a change in behavior in libio in glibc. But adding an explicit fsync() is trivial as soon as I can get a reproducer. TODO++ 73 de Jeff __ RPM Package Managerhttp://rpm5.org Developer Communication Listrpm-devel@rpm5.org