Re: fsync()

2007-07-03 Thread Jeff Johnson


On Jul 3, 2007, at 2:31 PM, Russell Coker wrote:


On Wednesday 04 July 2007 01:38, Jeff Johnson <[EMAIL PROTECTED]> wrote:

The primary goal of package management
is to install files reliably, not push the progress bars faster.


Of course we could have a mode of operation for the initial system  
install
that doesn't do the fsync().  The only time a progress bar matters  
for RPM

installation is the initial install.



I'd rather have an enabler on the p[rogress bar than a disabler on  
fsync ;-)


73 de Jeff

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Russell Coker
On Wednesday 04 July 2007 01:38, Jeff Johnson <[EMAIL PROTECTED]> wrote:
> The primary goal of package management
> is to install files reliably, not push the progress bars faster.

Of course we could have a mode of operation for the initial system install 
that doesn't do the fsync().  The only time a progress bar matters for RPM 
installation is the initial install.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Jeff Johnson


On Jul 3, 2007, at 1:50 PM, Michael Schroeder wrote:


On Tue, Jul 03, 2007 at 12:49:14PM -0400, Jeff Johnson wrote:


On Jul 3, 2007, at 12:16 PM, Michael Schroeder wrote:



No, as one needs a patch to Berkeleydb to make it support
individual syncing.



What's the patch? Overloading the sync vector or something deeper in
db/*?


Something deeper. (Hmm, didn't you have some good connections
to the berkeley db folks? Maybe you can push the patch upstream...)



Hehe. Yah, Keith Bostic put me up at his house in 1984 ;-) I'll try ...

Yah, I remember this patch now.

Hmmm, not syncing the mpool is likely asking for trouble though.

I have seen rpmdb lossage if --rebuilddb is run with bad data in  
__db* files.


73 de Jeff


--- db/db/db.c.orig 2004-11-11 15:58:46.0 +
+++ db/db/db.c  2005-12-15 16:17:45.0 +
@@ -591,6 +591,8 @@ __db_dbenv_mpool(dbp, fname, flags)
(F_ISSET(dbp, DB_AM_NOT_DURABLE) ? DB_TXN_NOT_DURABLE : 0),
0, dbp->pgsize)) != 0)
return (ret);
+   if (LF_ISSET(DB_NOFSYNC) && mpf->mfp)
+   F_SET(mpf->mfp, MP_NOFSYNC);

return (0);
 }
--- db/db/db_iface.c.orig   2004-10-16 01:31:54.0 +
+++ db/db/db_iface.c2005-12-15 16:17:45.0 +
@@ -1068,7 +1068,7 @@ __db_open_arg(dbp, txn, fname, dname, ty
 #defineOKFLAGS 
\
 (DB_AUTO_COMMIT | DB_CREATE | DB_DIRTY_READ | DB_EXCL |\
  DB_FCNTL_LOCKING | DB_NO_AUTO_COMMIT | DB_NOMMAP | DB_RDONLY |\
- DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN)
+ DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN |  
DB_NOFSYNC)

if ((ret = __db_fchk(dbenv, "DB->open", flags, OKFLAGS)) != 0)
return (ret);
if (LF_ISSET(DB_EXCL) && !LF_ISSET(DB_CREATE))
--- db/dbinc/db.in.orig 2004-10-16 01:31:54.0 +
+++ db/dbinc/db.in  2005-12-15 16:17:45.0 +
@@ -260,6 +260,7 @@ struct __db_dbt {
 #define	DB_FCNTL_LOCKING  0x0002000	/* UNDOC: fcntl(2)  
locking. */
 #define	DB_RDWRMASTER	  0x0004000	/* UNDOC: allow subdb master  
open R/W */
 #define	DB_WRITEOPEN	  0x0008000	/* UNDOC: open with write  
lock. */

+#defineDB_NOFSYNC0x001 /* UNDOC: don't fsync */

 /*
  * Flags private to DB_ENV->txn_begin.
--- db/dbinc/mp.h.orig  2004-10-16 01:31:54.0 +
+++ db/dbinc/mp.h   2005-12-15 16:25:56.0 +
@@ -309,6 +309,7 @@ struct __mpoolfile {
 #defineMP_FAKE_UOC 0x080   /* Unlink_on_close field: fake 
flag. */
 #defineMP_NOT_DURABLE  0x100   /* File is not durable. */
 #defineMP_TEMP 0x200   /* Backing file is a temporary. 
*/
+#define MP_NOFSYNC 0x400   /* Don't fsync */
u_int32_t  flags;
 };

--- db/mp/mp_sync.c.orig2004-11-11 15:58:48.0 +
+++ db/mp/mp_sync.c 2005-12-15 16:23:57.0 +
@@ -553,7 +553,7 @@ done:   /*
if (ret == 0 && (op == DB_SYNC_CACHE || op == DB_SYNC_FILE)) {
if (dbmfp == NULL)
ret = __memp_sync_files(dbenv, dbmp);
-   else
+   else if (!dbmfp->mfp || !F_ISSET(dbmfp->mfp, MP_NOFSYNC))
ret = __os_fsync(dbenv, dbmfp->fhp);
}

@@ -600,7 +600,7 @@ int __memp_sync_files(dbenv, dbmp)
MUTEX_THREAD_LOCK(dbenv, dbmp->mutexp);
for (dbmfp = TAILQ_FIRST(&dbmp->dbmfq);
dbmfp != NULL; dbmfp = TAILQ_NEXT(dbmfp, q)) {
-   if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY))
+   if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY | 
MP_NOFSYNC))
continue;
ret = __os_fsync(dbenv, dbmfp->fhp);
break;
@@ -662,6 +662,9 @@ __memp_mf_sync(dbmp, mfp)

dbenv = dbmp->dbenv;

+   if (F_ISSET(mfp, MP_NOFSYNC))
+   return 0;
+
/*
 	 * Expects caller to be holding the region lock: we're using the  
path

 * name and __memp_nameop might try and rename the file.

Cheers,
  Michael.

--
Michael Schroeder   [EMAIL PROTECTED]
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Michael Schroeder
On Tue, Jul 03, 2007 at 12:49:14PM -0400, Jeff Johnson wrote:
> 
> On Jul 3, 2007, at 12:16 PM, Michael Schroeder wrote:
> 
> >
> >No, as one needs a patch to Berkeleydb to make it support
> >individual syncing.
> >
> 
> What's the patch? Overloading the sync vector or something deeper in  
> db/*?

Something deeper. (Hmm, didn't you have some good connections
to the berkeley db folks? Maybe you can push the patch upstream...)

--- db/db/db.c.orig 2004-11-11 15:58:46.0 +
+++ db/db/db.c  2005-12-15 16:17:45.0 +
@@ -591,6 +591,8 @@ __db_dbenv_mpool(dbp, fname, flags)
(F_ISSET(dbp, DB_AM_NOT_DURABLE) ? DB_TXN_NOT_DURABLE : 0),
0, dbp->pgsize)) != 0)
return (ret);
+   if (LF_ISSET(DB_NOFSYNC) && mpf->mfp)
+   F_SET(mpf->mfp, MP_NOFSYNC);
 
return (0);
 }
--- db/db/db_iface.c.orig   2004-10-16 01:31:54.0 +
+++ db/db/db_iface.c2005-12-15 16:17:45.0 +
@@ -1068,7 +1068,7 @@ __db_open_arg(dbp, txn, fname, dname, ty
 #defineOKFLAGS 
\
 (DB_AUTO_COMMIT | DB_CREATE | DB_DIRTY_READ | DB_EXCL |\
  DB_FCNTL_LOCKING | DB_NO_AUTO_COMMIT | DB_NOMMAP | DB_RDONLY |\
- DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN)
+ DB_RDWRMASTER | DB_THREAD | DB_TRUNCATE | DB_WRITEOPEN | DB_NOFSYNC)
if ((ret = __db_fchk(dbenv, "DB->open", flags, OKFLAGS)) != 0)
return (ret);
if (LF_ISSET(DB_EXCL) && !LF_ISSET(DB_CREATE))
--- db/dbinc/db.in.orig 2004-10-16 01:31:54.0 +
+++ db/dbinc/db.in  2005-12-15 16:17:45.0 +
@@ -260,6 +260,7 @@ struct __db_dbt {
 #defineDB_FCNTL_LOCKING  0x0002000 /* UNDOC: fcntl(2) locking. */
 #defineDB_RDWRMASTER 0x0004000 /* UNDOC: allow subdb master 
open R/W */
 #defineDB_WRITEOPEN  0x0008000 /* UNDOC: open with write lock. 
*/
+#defineDB_NOFSYNC0x001 /* UNDOC: don't fsync */
 
 /*
  * Flags private to DB_ENV->txn_begin.
--- db/dbinc/mp.h.orig  2004-10-16 01:31:54.0 +
+++ db/dbinc/mp.h   2005-12-15 16:25:56.0 +
@@ -309,6 +309,7 @@ struct __mpoolfile {
 #defineMP_FAKE_UOC 0x080   /* Unlink_on_close field: fake 
flag. */
 #defineMP_NOT_DURABLE  0x100   /* File is not durable. */
 #defineMP_TEMP 0x200   /* Backing file is a temporary. 
*/
+#define MP_NOFSYNC 0x400   /* Don't fsync */
u_int32_t  flags;
 };
 
--- db/mp/mp_sync.c.orig2004-11-11 15:58:48.0 +
+++ db/mp/mp_sync.c 2005-12-15 16:23:57.0 +
@@ -553,7 +553,7 @@ done:   /*
if (ret == 0 && (op == DB_SYNC_CACHE || op == DB_SYNC_FILE)) {
if (dbmfp == NULL)
ret = __memp_sync_files(dbenv, dbmp);
-   else
+   else if (!dbmfp->mfp || !F_ISSET(dbmfp->mfp, MP_NOFSYNC))
ret = __os_fsync(dbenv, dbmfp->fhp);
}
 
@@ -600,7 +600,7 @@ int __memp_sync_files(dbenv, dbmp)
MUTEX_THREAD_LOCK(dbenv, dbmp->mutexp);
for (dbmfp = TAILQ_FIRST(&dbmp->dbmfq);
dbmfp != NULL; dbmfp = TAILQ_NEXT(dbmfp, q)) {
-   if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY))
+   if (dbmfp->mfp != mfp || F_ISSET(dbmfp, MP_READONLY | 
MP_NOFSYNC))
continue;
ret = __os_fsync(dbenv, dbmfp->fhp);
break;
@@ -662,6 +662,9 @@ __memp_mf_sync(dbmp, mfp)
 
dbenv = dbmp->dbenv;
 
+   if (F_ISSET(mfp, MP_NOFSYNC))
+   return 0;
+
/*
 * Expects caller to be holding the region lock: we're using the path
 * name and __memp_nameop might try and rename the file.

Cheers,
  Michael.

-- 
Michael Schroeder   [EMAIL PROTECTED]
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Jeff Johnson


On Jul 3, 2007, at 12:16 PM, Michael Schroeder wrote:



No, as one needs a patch to Berkeleydb to make it support
individual syncing.



What's the patch? Overloading the sync vector or something deeper in  
db/*?


73 de Jeff
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Michael Schroeder
On Tue, Jul 03, 2007 at 11:38:15AM -0400, Jeff Johnson wrote:
> We also disagree on the importance of attempting to sync
> data to disk in spite of modest cost. I believe that rpm (and rpm5.org)
> should sync to disk where appropriate rather than disabling rpmdb
> sync's (and fsync for file contents here) as SuSE

Just to clarify things: SUSE is just disabling the syncing
of the index databases, the Packages database is still fsync()ed.
Index databases can be easily regenerated by 'rpm --rebuilddb', 
but if Packages is corrupt you have hosed your system.

> and perhaps rpm.org are choosing to do.

No, as one needs a patch to Berkeleydb to make it support
individual syncing.

Cheers,
  Michael.

-- 
Michael Schroeder   [EMAIL PROTECTED]
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Jeff Johnson


On Jul 3, 2007, at 10:55 AM, Michael Schroeder wrote:


On Tue, Jul 03, 2007 at 10:39:10AM -0400, Jeff Johnson wrote:

But this isn't every program, rpm is an installer, and installers are
expected to try harder.

Slowing down is likely unmeasureable,


Not from my experience...



We also disagree on the importance of attempting to sync
data to disk in spite of modest cost. I believe that rpm (and rpm5.org)
should sync to disk where appropriate rather than disabling rpmdb
sync's (and fsync for file contents here) as SuSE and perhaps rpm.org
are choosing to do. The primary goal of package management
is to install files reliably, not push the progress bars faster.


and can certainly be conditioned
on file system type or configuration.


I wouldn't mind a rpm macro check so that people can turn it on
or off according to their needs.



Will do. Since noone is meaningfully configuring rpm, the default
(according to principle of least surprise) will be to always fsync.

73 de Jeff
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Michael Schroeder
On Tue, Jul 03, 2007 at 10:39:10AM -0400, Jeff Johnson wrote:
> But this isn't every program, rpm is an installer, and installers are
> expected to try harder.
> 
> Slowing down is likely unmeasureable,

Not from my experience...

> and can certainly be conditioned
> on file system type or configuration.

I wouldn't mind a rpm macro check so that people can turn it on
or off according to their needs.

Thanks,
  Michael.

-- 
Michael Schroeder   [EMAIL PROTECTED]
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Jeff Johnson


On Jul 3, 2007, at 6:19 AM, Michael Schroeder wrote:


On Tue, Jul 03, 2007 at 08:10:04PM +1000, Russell Coker wrote:
Do you consider systems that lose data, have internal databases  
that don't

match their own state, and generally cause down-time for people to be
correct?


You might as well argue that *every* program that writes to disk
should do fsync() calls. If you crash the system while rpm is
running the system is broken nevertheless. Your fsyncs just make
the critical window a bit smaller while slowing down rpm.



But this isn't every program, rpm is an installer, and installers are
expected to try harder.

Slowing down is likely unmeasureable, and can certainly be conditioned
on file system type or configuration.

73 de Jeff
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Michael Schroeder
On Tue, Jul 03, 2007 at 08:10:04PM +1000, Russell Coker wrote:
> Do you consider systems that lose data, have internal databases that don't 
> match their own state, and generally cause down-time for people to be 
> correct?

You might as well argue that *every* program that writes to disk
should do fsync() calls. If you crash the system while rpm is
running the system is broken nevertheless. Your fsyncs just make
the critical window a bit smaller while slowing down rpm.

Cheers,
  Michael.

-- 
Michael Schroeder   [EMAIL PROTECTED]
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-03 Thread Russell Coker
On Tuesday 03 July 2007 00:02, Michael Schroeder <[EMAIL PROTECTED]> wrote:
> On Sun, Jul 01, 2007 at 08:02:40PM +1000, Russell Coker wrote:
> > It's really a pity that people can't just write correct programs and that
> > it needs to cost someone time and money and then more time fighting with
> > the developers to get the fix applied.
>
> You have a very strange definition of "correct"...

1A) Write file A
1B) fsync() file A
2A) Write file B
2B) fsync() file B

If a program does the above and an unexpected system reset occurs at any stage 
then you will either get nothing, file A corrupted, file A written and no 
change to file B, file A written and file B corrupted, or both files written.  
Therefore if file A must be written before file B (IE file B is the RPM 
database that says that file A was correctly written) then the system will 
always be in a correct state (if the file was not installed then the RPM 
database will know about this).  My definition of "correct" has this 
guarantee.

1) Write file A
2) Write file B

If a program does the above and an unexpected system reset occurs then you 
could have no change, both files corrupted, both files written, one file 
corrupted (either file), or one file written and one file corrupted.  This 
means that you may have the system in an undefined state and have the RPM 
database not tell you (unless you check the file checksums) and is not what I 
consider correct behaviour.

Do you consider systems that lose data, have internal databases that don't 
match their own state, and generally cause down-time for people to be 
correct?


PS  I first encountered this problem on a SLES10-SP1 system.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-02 Thread Michael Schroeder
On Sun, Jul 01, 2007 at 08:02:40PM +1000, Russell Coker wrote:
> It's really a pity that people can't just write correct programs and that it 
> needs to cost someone time and money and then more time fighting with the 
> developers to get the fix applied.

You have a very strange definition of "correct"...

Michael.

-- 
Michael Schroeder   [EMAIL PROTECTED]
SUSE LINUX Products GmbH, GF Markus Rex, HRB 16746 AG Nuernberg
main(_){while(_=~getchar())putchar(~_-1/(~(_|32)/13*2-11)*13);}
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-01 Thread Andy Green
Russell Coker wrote:
> On Saturday 30 June 2007 19:52, Andy Green <[EMAIL PROTECTED]> wrote:
>> Russell Coker wrote:
>>> On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote:
 I don't think fsync() for individual files is really a fair answer,
>>> Why not?
>> $ man fsync
>> ...
>> DESCRIPTION
>>fsync()  transfers  ("flushes")  all  modified  in-core data of
>> (i.e., modified buffer cache pages for) the file referred to by the file
>>descriptor fd to the disk device (or other permanent storage
>> device) where that file resides.  The call blocks until the device  reports
>>that the transfer has completed.  It also flushes  metadata
>> information associated with the file (see stat(2)).
>> ...
>>
>> You're proposing that doing an fsync() after every unpacked file is
>> righteous for all cases?  RPM will slow down dramatically for no real
>> benefit.  If power is lost partway through an archive unpack the package
>> is still in an inconsistent partial state on the drive despite that the
>> the atomic unit of inconsistency is supposedly now one file.  One way or
>> another you end up with half a kernel package or whatever.
> 
> It is not required that you call fsync() on each file separately.  You can 
> write data to a number of files (900 file descriptors is safe on all 
> platforms) and then loop through calling fsync() (or fdatasync()) on each 

Russell you care about having it way more than I care about not having
it, Jeff seems to agree it is worth having, so I guess we will have it.
 You can always find a bad time to hit reset and get a partial package
on your drive, but I certainly agree it is better if rpm exiting is a
trustworthy signal that it is after the delicate moment.  Why sync()
somehow doesn't deliver that and fsync() does... 99% of C code is not
"correct"... *shrug*.

-Andy
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-01 Thread Russell Coker
On Saturday 30 June 2007 23:09, Andy Green <[EMAIL PROTECTED]> wrote:
> >> sync() is not the way to get some files committed to disk.
> >
> > Sure it is.  A sync() at the end is aimed at closing the window between
>
> One more point on fsync() vs sync()... sync() will mop up whatever the
> %pre/%post scripts have been up to, eg, ldconfig or whatever and commit
> it.  fsync() just on archive unpacked files will miss that.

The correct solution is to get the important programs to sync their own files 
via fsync().  A recent change to ldconfig made it use fsync() for correct 
operation (again due to me being at ground-zero when a machine was damaged).

It's really a pity that people can't just write correct programs and that it 
needs to cost someone time and money and then more time fighting with the 
developers to get the fix applied.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-07-01 Thread Russell Coker
On Saturday 30 June 2007 19:52, Andy Green <[EMAIL PROTECTED]> wrote:
> Russell Coker wrote:
> > On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote:
> >> I don't think fsync() for individual files is really a fair answer,
> >
> > Why not?
>
> $ man fsync
> ...
> DESCRIPTION
>fsync()  transfers  ("flushes")  all  modified  in-core data of
> (i.e., modified buffer cache pages for) the file referred to by the file
>descriptor fd to the disk device (or other permanent storage
> device) where that file resides.  The call blocks until the device  reports
>that the transfer has completed.  It also flushes  metadata
> information associated with the file (see stat(2)).
> ...
>
> You're proposing that doing an fsync() after every unpacked file is
> righteous for all cases?  RPM will slow down dramatically for no real
> benefit.  If power is lost partway through an archive unpack the package
> is still in an inconsistent partial state on the drive despite that the
> the atomic unit of inconsistency is supposedly now one file.  One way or
> another you end up with half a kernel package or whatever.

It is not required that you call fsync() on each file separately.  You can 
write data to a number of files (900 file descriptors is safe on all 
platforms) and then loop through calling fsync() (or fdatasync()) on each 
one.  By the time you have written data to file 900 there is a reasonable 
chance that some of the data from file 1 has made it to disk.  When you call 
fsync() on file 1 the filesystem driver may decide to sync the data for 
multiple files (there is nothing preventing a filesystem driver from writing 
more data to disk than is required).

The real benefit of fsync() is that you don't get messed up machines.  I 
started this discussion because I had a SUSE machine get corrupted files due 
to installing an RPM shortly before a reset.  The RPM system didn't indicate 
any problem and none of the other people working on the project had the 
skills needed to diagnose the problem.

> >> it's
> >> fine if it just uses the normal filesystem APIs per-file.  But after the
> >> transaction is complete, and you walk away thinking you did complete an
> >> rpm transaction, there is a case for adding a sync() to make sure
> >> everything you think you have done is truly committed to physical
> >> storage (maybe it does it already, I dunno).  On the one hand this is a
> >> relatively low probability issue for a desktop box but on the other hand
> >> it is pretty cheap.
> >
> > The time taken for a sync() system call can be very large when you have a
> > system under high write load.  Under some older versions of Linux the
> > time taken for sync() appeared to be unbounded (it apparently kept
> > looping through the list of data to write while more data was being added
> > to the list), a brief test suggests that recent versions of Linux may
> > have solved this.
>
> Well then why mention this as an issue.

Why mention what exactly?  Why mention the need for fsync()?  Because the lack 
of it results in damaged systems.  Why mention the relative merits of sync() 
and fsync()?  Because someone else advocated the use of sync().

> > sync() is not the way to get some files committed to disk.
>
> Sure it is.  A sync() at the end is aimed at closing the window between
> rpm completing a transaction (and feeding back that it is completed),

From sync(2):
BUGS
   According  to  the  standard specification (e.g., POSIX.1-2001), sync()
   schedules the writes, but may return before the actual writing is done.
   However,  since  version  1.3.20 Linux does actually wait.  (This still
   does not guarantee data integrity: modern disks have large caches.)

> and the completed actions not being on physical storage.  With a single
> sync() at the end you don't come back to the prompt from rpm until the
> transaction is completed not only in cache but at the physical storage.

Pity that sync() is not guaranteed to work that way.  Note that we want RPM to 
work on OSs other than Linux too.

>  (In the case of HDDs neither fsync() nor sync() guarantee that the data
> is committed from the HDD private cache to the nonvolitile storage, but
> that should normally happen very shortly afterwards).

In a correctly configured drive sub-system you will not have any write-back 
cache that is volatile and which will be considered as being stable for the 
purpose of fsync().  Lots of cheap hard drives do the wrong thing in this 
regard, if you have a SAS or SCSI drive or a correctly configured SATA or IDE 
drive then the right thing should be done.

> Anyway I just mentioned it has value for embedded flash devices.  I
> don't know it or fsync() has much value for PCs.

I've just had a SERVER become unusable because of this problem.  I think it's 
more important for a server than for an embedded system.  Embedded systems 
are rarely upgraded and generally only have RPMs installed in a factory. 

Re: fsync()

2007-06-30 Thread Russell Coker
On Saturday 30 June 2007 15:01, "Wichmann, Mats D" <[EMAIL PROTECTED]> 
wrote:
> >> OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery
> >> to me.
> >
> > If you want to know any specific things about XFS then ask me.
> > I'm currently working for SGI in a group that is associated
> > with the XFS group.  I regularly talk to the XFS experts and
> > can get any specific questions answered.
>
> This is ridiculous.
>
> Nobody should need to know anything about the internal
> details of a specific filesystem's implementation when
> they're writing a userspace application. If the filesystem
> has chosen to do silly optimizations which doesn't work
> right when using published apis and methods, it's the
> filesystem that is broken and not even remotely the
> application's fault.

It's because of silly statements such as the above that I hesitate to mention 
that I'm using XFS in such discussions.

ALL filesystems implement write-back caching for performance unless mounted 
with the "sync" option (which on Linux is documented as only working for 
ext2, ext3, and UFS).  Also ext3 has the commit=X option where X will be the 
maximum number of seconds that the data will not be sync'd for (default 5 
seconds), of course if during a 5 second period you dirty enough pages to 
need a minute to write them all back...  Even 5 seconds is enough time to 
install a package on a fast machine and then have the machine rebooted thus 
resulting in an inconsistent state.

You have to deal with the fact that there are many filesystems out there other 
than Ext2/3 and UFS, filesystems that are designed to give maximum 
performance and which actually implement the OS APIs for the purpose that 
they were intended.

Here's the relevant section from write(2):
NOTES
   A  successful return from write() does not make any guarantee that data
   has been committed to disk.  In fact, on some buggy implementations, it
   does  not  even guarantee that space has successfully been reserved for
   the data.  The only way to be sure is to call fsync(2)  after  you  are
   done writing all your data.


-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-30 Thread Andy Green
Andy Green wrote:

>> sync() is not the way to get some files committed to disk.
> 
> Sure it is.  A sync() at the end is aimed at closing the window between

One more point on fsync() vs sync()... sync() will mop up whatever the
%pre/%post scripts have been up to, eg, ldconfig or whatever and commit
it.  fsync() just on archive unpacked files will miss that.

-Andy
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-30 Thread Andy Green
Russell Coker wrote:

> On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote:
>> I don't think fsync() for individual files is really a fair answer,
> 
> Why not?

$ man fsync
...
DESCRIPTION
   fsync()  transfers  ("flushes")  all  modified  in-core data of
(i.e., modified buffer cache pages for) the file referred to by the file
   descriptor fd to the disk device (or other permanent storage
device) where that file resides.  The call blocks until the device  reports
   that the transfer has completed.  It also flushes  metadata
information associated with the file (see stat(2)).
...

You're proposing that doing an fsync() after every unpacked file is
righteous for all cases?  RPM will slow down dramatically for no real
benefit.  If power is lost partway through an archive unpack the package
is still in an inconsistent partial state on the drive despite that the
the atomic unit of inconsistency is supposedly now one file.  One way or
another you end up with half a kernel package or whatever.

>> it's 
>> fine if it just uses the normal filesystem APIs per-file.  But after the
>> transaction is complete, and you walk away thinking you did complete an
>> rpm transaction, there is a case for adding a sync() to make sure
>> everything you think you have done is truly committed to physical
>> storage (maybe it does it already, I dunno).  On the one hand this is a
>> relatively low probability issue for a desktop box but on the other hand
>> it is pretty cheap.
> 
> The time taken for a sync() system call can be very large when you have a 
> system under high write load.  Under some older versions of Linux the time 
> taken for sync() appeared to be unbounded (it apparently kept looping through 
> the list of data to write while more data was being added to the list), a 
> brief test suggests that recent versions of Linux may have solved this.

Well then why mention this as an issue.

> sync() is not the way to get some files committed to disk.

Sure it is.  A sync() at the end is aimed at closing the window between
rpm completing a transaction (and feeding back that it is completed),
and the completed actions not being on physical storage.  With a single
sync() at the end you don't come back to the prompt from rpm until the
transaction is completed not only in cache but at the physical storage.
 (In the case of HDDs neither fsync() nor sync() guarantee that the data
is committed from the HDD private cache to the nonvolitile storage, but
that should normally happen very shortly afterwards).

Anyway I just mentioned it has value for embedded flash devices.  I
don't know it or fsync() has much value for PCs.

-Andy
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


RE: fsync()

2007-06-29 Thread Wichmann, Mats D
[EMAIL PROTECTED] wrote:
> On Saturday 30 June 2007 12:02, Jeff Johnson <[EMAIL PROTECTED]> wrote:
>> OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery
>> to me.
> 
> If you want to know any specific things about XFS then ask me.
> I'm currently working for SGI in a group that is associated 
> with the XFS group.  I regularly talk to the XFS experts and 
> can get any specific questions answered. 

This is ridiculous.

Nobody should need to know anything about the internal
details of a specific filesystem's implementation when
they're writing a userspace application. If the filesystem
has chosen to do silly optimizations which doesn't work
right when using published apis and methods, it's the 
filesystem that is broken and not even remotely the
application's fault.
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Russell Coker
On Saturday 30 June 2007 12:02, Jeff Johnson <[EMAIL PROTECTED]> wrote:
> OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery
> to me.

If you want to know any specific things about XFS then ask me.  I'm currently 
working for SGI in a group that is associated with the XFS group.  I 
regularly talk to the XFS experts and can get any specific questions 
answered.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Jeff Johnson


On Jun 29, 2007, at 6:04 PM, Russell Coker wrote:


On Saturday 30 June 2007 02:04, Jeff Johnson <[EMAIL PROTECTED]> wrote:

Should I add O_SYNC when opening files on delayed write file systems?
Doable, but annoying mapping the path back to a file system type to
infer functionality.


O_SYNC is not the correct solution.  XFS likes to delay block  
allocation to
get contiguous files.  O_SYNC on XFS would either result in re- 
allocating
file blocks (terrible for write performance) or discontigous  
files.  Write

performance will always be expected to be better from a fsync() before
close() than from O_SYNC.



OK/ fsync before close, todo++. Thanks for the info, xfs is a mystery  
to me.


73 de Jeff

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Russell Coker
On Saturday 30 June 2007 02:04, Jeff Johnson <[EMAIL PROTECTED]> wrote:
> Should I add O_SYNC when opening files on delayed write file systems?
> Doable, but annoying mapping the path back to a file system type to  
> infer functionality.

O_SYNC is not the correct solution.  XFS likes to delay block allocation to 
get contiguous files.  O_SYNC on XFS would either result in re-allocating 
file blocks (terrible for write performance) or discontigous files.  Write 
performance will always be expected to be better from a fsync() before 
close() than from O_SYNC.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Russell Coker
On Saturday 30 June 2007 00:35, Andy Green <[EMAIL PROTECTED]> wrote:
> I don't think fsync() for individual files is really a fair answer,

Why not?

> it's 
> fine if it just uses the normal filesystem APIs per-file.  But after the
> transaction is complete, and you walk away thinking you did complete an
> rpm transaction, there is a case for adding a sync() to make sure
> everything you think you have done is truly committed to physical
> storage (maybe it does it already, I dunno).  On the one hand this is a
> relatively low probability issue for a desktop box but on the other hand
> it is pretty cheap.

The time taken for a sync() system call can be very large when you have a 
system under high write load.  Under some older versions of Linux the time 
taken for sync() appeared to be unbounded (it apparently kept looping through 
the list of data to write while more data was being added to the list), a 
brief test suggests that recent versions of Linux may have solved this.

sync() is not the way to get some files committed to disk.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Jeff Johnson


On Jun 29, 2007, at 9:51 AM, Russell Coker wrote:




But adding an explicit fsync() is trivial as soon as I can get a
reproducer.


You want to be able to repeatably trigger a race-condition before  
you fix it?




Nah, I just want to verify that I planted the sync/flush in the right  
place.



To cause this race condition you must first use a file-system that is
optimised for performance (EG XFS) so that it will allow long cache
write-back times and also do write-related tasks after closing the  
file (EG
assigning disk blocks to the file after close() so that it knows  
the length).
Then put some load on the system while installing an RPM, and then  
trigger a

hardware reset shortly after rpm exits.



Should I add O_SYNC when opening files on delayed write file systems?
Doable, but annoying mapping the path back to a file system type to  
infer

functionality.

73 de Jeff

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Andy Green
Russell Coker wrote:

>> But adding an explicit fsync() is trivial as soon as I can get a
>> reproducer.
> 
> You want to be able to repeatably trigger a race-condition before you fix it?
> 
> To cause this race condition you must first use a file-system that is 
> optimised for performance (EG XFS) so that it will allow long cache 
> write-back times and also do write-related tasks after closing the file (EG 
> assigning disk blocks to the file after close() so that it knows the length). 
>  
> Then put some load on the system while installing an RPM, and then trigger a 
> hardware reset shortly after rpm exits.

I have some patches for busybox that allow rpm packages to be used
there.  Often with embedded it is flash behind the filesystem, so I make
it call sync() on exit for just this scenario.

I don't think fsync() for individual files is really a fair answer, it's
fine if it just uses the normal filesystem APIs per-file.  But after the
transaction is complete, and you walk away thinking you did complete an
rpm transaction, there is a case for adding a sync() to make sure
everything you think you have done is truly committed to physical
storage (maybe it does it already, I dunno).  On the one hand this is a
relatively low probability issue for a desktop box but on the other hand
it is pretty cheap.

-Andy
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-29 Thread Russell Coker
On Friday 29 June 2007 05:37, Jeff Johnson <[EMAIL PROTECTED]> wrote:
> On Jun 28, 2007, at 2:28 AM, Russell Coker wrote:
> > When upgrading a package with RPM version 4.4.2 in SUSE doesn't
> > call fsync()!
> > It creates a temporary file (without using O_SYNC), writes all the
> > data to
> > it, closes it, and then renames it to replace the original file.
>
> The temporary file has the /path/to/file;12345678 transaction id
> appended?

I don't recall the name.

> A close should sync the data, should it not?

Not necessarily.  Some filesystems (such as XFS) try to deduce what a 
user-space program desires by the pattern of system-calls and implements it 
(EG a certain combination of create and rename can cause the data to be 
sync'd faster).

> Or do you mean the rpmdb files?

No, I mean file data.

> SuSE has a very different usage case 
> for a rpmdb, and insists on avoiding sync whenever possible for
> "performance"
> reasons.

Then SuSE are butt-heads.

> > Has this horrible mistake been fixed in the upstream tree?
>
> I believe the problem is a change in behavior in libio in glibc.

I believe that it has nothing to do with glibc or any other user-space code.

> But adding an explicit fsync() is trivial as soon as I can get a
> reproducer.

You want to be able to repeatably trigger a race-condition before you fix it?

To cause this race condition you must first use a file-system that is 
optimised for performance (EG XFS) so that it will allow long cache 
write-back times and also do write-related tasks after closing the file (EG 
assigning disk blocks to the file after close() so that it knows the length).  
Then put some load on the system while installing an RPM, and then trigger a 
hardware reset shortly after rpm exits.

-- 
[EMAIL PROTECTED]
http://etbe.coker.com.au/  My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development
__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org


Re: fsync()

2007-06-28 Thread Jeff Johnson


On Jun 28, 2007, at 2:28 AM, Russell Coker wrote:

When upgrading a package with RPM version 4.4.2 in SUSE doesn't  
call fsync()!
It creates a temporary file (without using O_SYNC), writes all the  
data to

it, closes it, and then renames it to replace the original file.



The temporary file has the /path/to/file;12345678 transaction id  
appended?


A close should sync the data, should it not? I've a dim memory of
a problem with sync on close using libio that I had to fix 6 months
ago with O_RDONLY opens (only O_RDWR need be fflush'd and
presumably sync'd.

Or do you mean the rpmdb files? SuSE has a very different usage case
for a rpmdb, and insists on avoiding sync whenever possible for  
"performance"

reasons.


Has this horrible mistake been fixed in the upstream tree?



I believe the problem is a change in behavior in libio in glibc.
But adding an explicit fsync() is trivial as soon as I can get a
reproducer.

TODO++

73 de Jeff

__
RPM Package Managerhttp://rpm5.org
Developer Communication Listrpm-devel@rpm5.org