Bill Moran wrote:
Jens Rehsack wrote:

Don Lewis wrote:

On 2 Oct, Terry Lambert wrote:


[...]

Actually, write caching is not so much the problem, as the disk
reporting that the write has completed before the contents of
the transaction saved in the write cache have actually been
committed to stable storage.

Unfortunately, IDE disks do not permit disconnected writes, due
to a bug in the original IDE implementation, which has been
carried forward for [insert no good reason here].

Therefore IDE disks almost universally lie to the driver any
time write caching is enabled on an IDE drive.

In most cases, if you use SCSI, the problem will go away.


Nope, they "lie" as well unless you turn of the WCE bit.  Fortunately
with tagged command queuing there is very little performance penalty for
doing this in most cases.  The main exception to this is when you run
newfs which talks to the raw partition and only has one command
outstanding at a time.

Back in the days when our SCSI implementation would spam the console
whenever it reduced the number of tagged openings because the drive
indicated that its queue was full, I'd see the number of tagged openings
stay at 63 if write caching was disabled, but the number would drop
significantly under load (50%?) if write caching was enabled.  I always
suspected that the drive's cache was full of data for write commands
that it had indicated to the host as being complete even though the data
hadn't been written to stable storage.

Unfortunately SCSI drives all seem to ship with the WCE bit set,
probably for "benchmarking" reasons, so I always have to remember to
turn this bit off whenever I install a new drive.


A message from this morning ('file system (UFS2) consistancy
after -current crash?') to this list describes exactly the
situation on my fileserver a few month ago, except my machine
runs with FreeBSD 4-STABLE and has an ICP-Vortex 6528RD controller.

I think, disk's or controllers (short hardware) write cache
is a problem. Maybe it shouldn't be in theory, but it is in
real world :-)


This is somewhat relevent to a discussion occurring this week on the
PostgreSQL performance mailing list.

A fellow was testing a number of caching options for disk drives, in
conjunction with the performance impact it had on Postgre.  Near the
end of the discussion and his testing, he decided to do a plug test
(i.e., pull the power plug out of the wall while Postgre was running a
benchmark and see if the database was recoverable on reboot).

The tests don't 100% apply, since he was testing with Linux and XFS,
but I think the results speak VOLUMES!

You realize the sync() and fsync() commandos are severely BROKEN under Linux already on VFS level? OK. kernel 2.6 "will get it right somehow". But until then one should not even think about using Linux in a trully sensitive environment as a DB server. I doubt seriously that it is the disk caching which is to be blamed here, since otherwise crashing on journaling filesystems would be almost for sure desasterous every time you do it... The cache sized on disks are so thinny in comparision to the sector size that you will almost immediately have the caches flushed anyway by a margin of well below one second.

_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to