Re: dangerous situation with shutdown process

2005-07-19 Thread Don Lewis
On 18 Jul, Matthias Buelow wrote:
 Paul Mather [EMAIL PROTECTED] writes:
 
Why would that necessarily be more successful?  If the outstanding
buffers count is not reducing between time intervals, it is most likely
because there is some underlying hardware problem (e.g., a bad block).
If the count still persists in staying put, it likely means whatever the
hardware is doing to try and fix things (e.g., write reallocation) isn't
working, and so the kernel may as well give up.
 
 So the kernel is relying on guesswork whether the buffers are flushed
 or not...
 
You can enumerate the buffers and *try* to write them, but that doesn't
guarantee they will be written successfully any more than observing the
relative number left outstanding.
 
 That's rather nonsensical. If I write each buffer synchronously (and
 wait for the disk's response) this is for sure a lot more reliable than
 observing changes in the number of remaining buffers. I mean, where's
 the sense in the latter? It would be analogous to, in userspace, having
 to monitor write(2) continuously over a given time interval and check
 whether the number it returns eventually reaches zero. That's complete
 madness, imho.

During syncer shutdown, the numbers being printed are actually the
number of vnodes that have dirty buffers.  The syncer walks the list of
vnodes with dirty buffers any synchronously flushes each one to disk
(modulo whatever write-caching is done by the drive).

The reason that the monitors the number of dirty vnodes instead of just
interating once over the list is that with softupdates, flushing one
vnode to disk can cause another vnode to be dirtied and put on the list,
so it can take multiple passes to flush all the dirty vnodes.  Its
normal to see this if the machine was at least moderately busy before
being shut down.  The number of dirty vnodes will start off at a high
number, decrease rapidly at first, and then decrease to zero.  It is not
unusual to see the number bounce from zero back into the low single
digits a few times before stabilizing at zero and triggering the syncer
termination code.

The syncer shutdown algorithm could definitely be improved to speed it
up.  I didn't want it to push out too many vnodes at the start of the
shutdown sequence, but later in the sequence the delay intervals could
be shortened and more worklist buckets could be visited per interval to
speed the shutdown.  One possible complication that I worry about is
that the new vnodes being added to the list might not be added
synchronously, so if the syncer processes the worklist and shuts down
too quickly it might miss vnodes that got added too late.

I've never seen a syncer shutdown timeout, though it could happen if
either the underlying media became unwriteable or if a process got
wedged while holding a vnode lock.  In either case, it might never be
possible to flush the dirty vnodes in question.

The final sync code in boot() just iterated over the dirty buffers, but
it was not unusual for it to get stuck on mutually dependent buffers. I
would see this quite frequently if I did a shutdown immediately after
running mergemaster.  The final sync code would flush all but the last
few buffers and finally time out.  This problem was my motivation for
adding the shutdown code to the syncer so that the final sync code would
hopefully not have anything to do.

The final sync code also gets confused if you have any ext2 file systems
mounted (even read-only) and times out while waiting for the ext2 file
system to release its private buffers (which only happens when the file
system is unmounted).



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-19 Thread Don Lewis
On 16 Jul, David Taylor wrote:
 On Sat, 16 Jul 2005, Matthias Buelow wrote:
 David Taylor [EMAIL PROTECTED] writes:
 
  A corrupted journal can be detected. If it's corrupted, discard
  the whole thing, or only the relevant entry. The filesystem will
  remain consistent.
  If track corruption occurs after the journal is written, it doesn't
  matter, since at boot the journal will be replayed and all operations
  will be performed once more.
 
 The track which is corrupted could contain data that wasn't written
 to in months.  How would the journal help?
 
 I don't understand this question.
 
 When the drive is powered off, the track being written to at that point
 may be corrupted, right?  That track may contain sectors that the OS
 did't change.  These sectors would not be mentioned in the journal.
 How would a journaling fs fix the corruption?
 
 I suppose this could be avoided by requiring that all writes (and
 journal entries) somehow correspond to a full track.  (Which I suppose
 they may do already, but I don't think so).

The track size is not constant.  There are more sectors in the outer
cylinder tracks than there are in inner cylinder tracks.  I'm not even
sure if it is possible to extract the detailed geometry info from the
drive.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-19 Thread Don Lewis
On 14 Jul, Matthias Buelow wrote:
 Kevin Oberman wrote:
 
 How can I fix it on my system?

SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
the sysctl.
 
 You do NOT want to do that. Not only will performance drop brutally
 (example: drop to 1/5th of normal write speed for sequential writes,
 probably worse for random writes) but it will also significantly
 reduce the lifetime of your disk. Modern disks are designed to be
 used with the write-back cache enabled, so don't turn it off.

There's not much performance difference with SCSI if write caching is
disabled.  Typical SCSI drives can handle ~63 outstanding read and write
transactions and can sort them into a somewhat optimal order if tagged
command queuing is in use.

The problem is that disks lie about whether they have actually written
data. If the power goes off before the data is in cache, it's lost.
 
 No, the problem is that FreeBSD doesn't implement request barriers
 and that softupdates is flawed by design and seemingly could not
 make use of them, even if they were available (because, as I
 understand it, it relies on a total ordering of all writes, unlike
 the partial ordering necessary for a journalled fs).

Softupdates only needs to be partial ordering.  It just needs to be
notified when the data hits the platter so that it can send any
dependent writes to the disk.

Wouldn't the use of barriers have the potential to force a lot of
unrelated cached write data to be written much earlier than necessary?
If so, there would seem to be a performance penalty under certain
workloads, though performance would still be better than with
write-caching disabled.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-19 Thread Don Lewis
On 14 Jul, Kevin Oberman wrote:
 Date: Thu, 14 Jul 2005 20:38:15 +0200
 From: Anatoliy Dmytriyev [EMAIL PROTECTED]
 Sender: [EMAIL PROTECTED]
 
 Hello, everybody!
 
 I have found unusual and dangerous situation with shutdown process:
 I did a copy of 200 GB data on the 870 GB partition (softupdates is 
 enabled) by cp command.
 It took a lot of time when I did umount for this partition exactly after 
 cp, but procedure finished correctly.

When you unmounted the file system, that should have flushed all the
dirty files to the disk.

 In case, if I did “shutdown –h(r)”, also exactly after cp, the 
 shutdown 
 procedure waited for “sync” (umounting of the file system) but sync 
 process was terminated by  timeout, and fsck checked and did correction 
 of the file system after boot.

Did the timeout occur during the syncer shutdown, or at the syncing
disks ... step.

Did you have any ext2 file systems mounted?  These should be manually
unmounted before shutdown because they confuse the final sync code.

 System 5.4-stable, RAM 4GB, processor P-IV 3GHz.
 
 How can I fix it on my system?
 
 SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
 the sysctl.
 
 The problem is that disks lie about whether they have actually written
 data. If the power goes off before the data is in cache, it's lost.

That should only make a difference in a power-fail situation, and it
only makes a difference if the only unwritten data is in the drive's
write cache.

 I am not sure if write-cache can be turned off on SCSI, but SCSI drives
 seem less likely to lie about when the data is actually flushed to the
 drive. 

Yes it can, and I recommend it.  Use the camcontrol modepage command to
set the WCE bit to 0.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Oliver Fromme
Kevin Oberman [EMAIL PROTECTED] wrote:
  [...]
  I believe that the Windows solution to this problem is to put a really,
  really long delay between when the system is finished syncing and when
  the power is turned off.

Definitely not.  When I compare Windows XP and FreeBSD on
the same hardware (notebook with ATA disk), then Windows'
shutdown process is a lot faster than FreeBSD's.  In fact,
when I shut it down under XP for the first time, the power
was off so quickly that I thought someting must have gone
wrong.  But everything was OK and normal.

  This might be the best solution for FreeBSD, as
  well, but it will irritate people.

It is already irritating that FreeBSD sits there doing
nothing for ~ 5 seconds before turning power off.  Windows
doesn't do that.  (Yes I know, there's a sysctl for that,
but I suspect that it's not save to modify it in FreeBSD.)

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH  Co KG, Marktplatz 29, 85567 Grafing
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

Emacs ist für mich kein Editor. Für mich ist das genau das gleiche, als
wenn ich nach einem Fahrrad (für die Sonntagbrötchen) frage und einen
pangalaktischen Raumkreuzer mit 10 km Gesamtlänge bekomme. Ich weiß nicht,
was ich damit soll. -- Frank Klemm, de.comp.os.unix.discussion
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Oliver Fromme
Matthias Buelow [EMAIL PROTECTED] wrote:
  Sorry folks, have I somehow dropped into a parallel universe,
  or is there some serious misunderstanding going on?

Seems so.

  To the OP: There is no sync process that is being killed by
  shutdown

Yes, there is a kernel process called syncer.  During
shutdown, each of the kernel processes (including the
syncer) has 60 seconds to terminate.  If it doesn't,
timed out is printed on the console.

This timeout can be changed using a sysctl tunable
(kern.shutdown.kproc_shutdown_wait).

  The kernel writes out all dirty buffers as part of its
  shutdown procedure.

When you shut down a machine, the kernel flushes all dirty
buffers to disk.  While it is doing that, it displays the
number of remaining buffers, with increasing time intervals
between them.  If there are still buffers left after a
certain number of intervals without change, the kernel
gives up.

If that is really the problem, then the best solution would
be to make the number of flushing intervals and/or the
increasing interval a sysctl tunable.  They're currently
hardcoded at 20 and 50ms, respectively; see the boot()
function in src/sys/kern/kern_shutdown.c.  That means
that the timeout will happen after 10 seconds.  Doubling
the number of intervals (i.e. 40 instead of 20) will make
the timeout happen after 40 seconds, which should be
sufficient.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH  Co KG, Marktplatz 29, 85567 Grafing
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

C++ is the only current language making COBOL look good.
-- Bertrand Meyer
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Matthias Buelow
Oliver Fromme [EMAIL PROTECTED] writes:

buffers to disk.  While it is doing that, it displays the
number of remaining buffers, with increasing time intervals
between them.  If there are still buffers left after a
certain number of intervals without change, the kernel
gives up.

Why is it doing this? Can't it just enumerate the buffers and write
them, one by one?

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Paul Mather
On Mon, 2005-07-18 at 16:35 +0200, Matthias Buelow wrote:
 Oliver Fromme [EMAIL PROTECTED] writes:
 
 buffers to disk.  While it is doing that, it displays the
 number of remaining buffers, with increasing time intervals
 between them.  If there are still buffers left after a
 certain number of intervals without change, the kernel
 gives up.
 
 Why is it doing this? Can't it just enumerate the buffers and write
 them, one by one?

Why would that necessarily be more successful?  If the outstanding
buffers count is not reducing between time intervals, it is most likely
because there is some underlying hardware problem (e.g., a bad block).
If the count still persists in staying put, it likely means whatever the
hardware is doing to try and fix things (e.g., write reallocation) isn't
working, and so the kernel may as well give up.

You can enumerate the buffers and *try* to write them, but that doesn't
guarantee they will be written successfully any more than observing the
relative number left outstanding.

Cheers,

Paul.
-- 
e-mail: [EMAIL PROTECTED]

Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid.
--- Frank Vincent Zappa
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Matthias Buelow
Paul Mather [EMAIL PROTECTED] writes:

Why would that necessarily be more successful?  If the outstanding
buffers count is not reducing between time intervals, it is most likely
because there is some underlying hardware problem (e.g., a bad block).
If the count still persists in staying put, it likely means whatever the
hardware is doing to try and fix things (e.g., write reallocation) isn't
working, and so the kernel may as well give up.

So the kernel is relying on guesswork whether the buffers are flushed
or not...

You can enumerate the buffers and *try* to write them, but that doesn't
guarantee they will be written successfully any more than observing the
relative number left outstanding.

That's rather nonsensical. If I write each buffer synchronously (and
wait for the disk's response) this is for sure a lot more reliable than
observing changes in the number of remaining buffers. I mean, where's
the sense in the latter? It would be analogous to, in userspace, having
to monitor write(2) continuously over a given time interval and check
whether the number it returns eventually reaches zero. That's complete
madness, imho.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Oliver Fromme
Matthias Buelow [EMAIL PROTECTED] wrote:
  Oliver Fromme [EMAIL PROTECTED] writes:
   buffers to disk.  While it is doing that, it displays the
   number of remaining buffers, with increasing time intervals
   between them.  If there are still buffers left after a
   certain number of intervals without change, the kernel
   gives up.
  
  Why is it doing this? Can't it just enumerate the buffers and write
  them, one by one?

I don't think that the boot() function in kern_shutdown.c
can do that.  It has got nothing to do with the syncing
business itself.  It can only trigger the syncing (similar
to the sync(8) tool), which basically means performing a
vfs_sync with flag MNT_NOWAIT for every mounted filesystem.
Then it has to wait for the appropriate kernel process to
do its job.  See the source.

I don't think there's an easy way to change that.  If you
see such a way, I'd suggest you code it up and use send-pr.

Best regards
   Oliver

-- 
Oliver Fromme,  secnetix GmbH  Co KG, Marktplatz 29, 85567 Grafing
Any opinions expressed in this message may be personal to the author
and may not necessarily reflect the opinions of secnetix in any way.

Python tricks is a tough one, cuz the language is so clean. E.g.,
C makes an art of confusing pointers with arrays and strings, which
leads to lotsa neat pointer tricks; APL mistakes everything for an
array, leading to neat one-liners; and Perl confuses everything
period, making each line a joyous adventure wink.
-- Tim Peters
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Lowell Gilbert
Matthias Buelow [EMAIL PROTECTED] writes:

 Lowell Gilbert wrote:
 
 Well, break it down a little bit.  If an ATA drive properly implements
 the cache flush command, then none of the ongoing discussion is
 
 Are you sure this is the case? Are there sequence points in softupdates
 where it issues a flush request and by this guarantees fs integrity?

No, you're right.  I meant write completions, not cache flushes.
I don't know of any drives that do one properly and not the other, 
but they're certainly not the same thing.

 I've read thru McKusick's paper in search for an answer but haven't
 found any. All I've read so far on mailing lists and from googling
 was that softupdates doesn't work if the wb-cache is enabled.

On a lot of ATA drives that don't implement the spec properly.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Paul Mather
On Mon, 2005-07-18 at 17:14 +0200, Matthias Buelow wrote:
 Paul Mather [EMAIL PROTECTED] writes:
 
 Why would that necessarily be more successful?  If the outstanding
 buffers count is not reducing between time intervals, it is most likely
 because there is some underlying hardware problem (e.g., a bad block).
 If the count still persists in staying put, it likely means whatever the
 hardware is doing to try and fix things (e.g., write reallocation) isn't
 working, and so the kernel may as well give up.
 
 So the kernel is relying on guesswork whether the buffers are flushed
 or not...

I don't know if you are just deliberately trying to be contentious, but
that is a serious misrepresentation of what is happening.  Quite
obviously the kernel knows whether a buffer has successfully been
flushed, otherwise a count of outstanding buffers would be meaningless.
(Surely you're not saying the kernel simply guesses if a buffer has been
flushed in maintaining its count of outstanding buffers?  What would be
the point of that?)

If you calm down and think about it for a little, you'll realise what
you suggest to do and what is actually done amount to the same thing in
practical terms.

It's all very easy to say to write each buffer synchronously (and wait
for the disk's response), but what do you do when the buffer *does* get
stuck and won't complete (e.g., because someone removed the floppy or
USB disk, or your remote ggate server disappeared, or your hard disk is
going bad, etc.)?  Do you just bail immediately at that point?  Or do
you keep retrying in the hope it will complete?  In the end, it comes
down to waiting a certain amount of time for drivers to do their best
and then giving up.  The only real question is how long you wait, and
maybe whether syncer is not waiting long enough (and hence how to extend
the amount of time it is willing to wait until it gives up on buffers
being unflushable).  I'm not sure why that is fundamentally madness.

Cheers,

Paul.
-- 
e-mail: [EMAIL PROTECTED]

Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid.
--- Frank Vincent Zappa
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Jayton Garnett


Oliver Fromme wrote:


Definitely not.  When I compare Windows XP and FreeBSD on
the same hardware (notebook with ATA disk), then Windows'
shutdown process is a lot faster than FreeBSD's.  In fact,
when I shut it down under XP for the first time, the power
was off so quickly that I thought someting must have gone
wrong.  But everything was OK and normal.

 

Yes XP's shutdown time is quicker on a fresh install, but give it a few 
weeks or months (depending on how you use XP)
and you will notice that the shutdown time increases. Right now my XP 
pro takes about 20 or more seconds to shutdown.
I am sure this is due to the MFT under a NTFS installation, FreeBSD does 
not appear to have this problem over extended
use, and there is no way of stopping the MTF growth problem (as far as I 
know)



It is already irritating that FreeBSD sits there doing
nothing for ~ 5 seconds before turning power off.  Windows
doesn't do that.  (Yes I know, there's a sysctl for that,
but I suspect that it's not save to modify it in FreeBSD.)

Best regards
  Oliver

 




--
Kind regards,
Jayton Garnett

email: [EMAIL PROTECTED]
Main : www.uberhacker.co.uk
Test server: jayton.plus.com


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-18 Thread Joel Rees

Comment from out in left field --

On 2005/07/16, at 6:03, Kevin Oberman wrote:

[...]
I believe that the Windows solution to this problem is to put a  
really,

really long delay between when the system is finished syncing and when
the power is turned off.


That's what I vote for. If the system has ATA on it, send a line to  
the console that says


waiting for ATA technology drives to quit lying

after the final sync, and then wait 30 seconds to cut power.


This might be the best solution for FreeBSD, as
well, but it will irritate people.


My impression is that in this case irritation is recommended.

(I'm half wondering if Microsoft and the drive manufacturer's haven't  
defined some hidden API for forcing the drive electronics to be  
truthful.)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Michael Nottebrock
On Friday, 15. July 2005 21:14, Matthias Buelow wrote:

 Why am I arguing in an uphill battle here?
 important to the FreeBSD community? Such issues should not even
 have to be discussed at all!

I completely agree, and there's really no point in arguing with people who are 
happy to throw dollars at the problem rather than fixing it either. At this 
point, code is needed.

-- 
   ,_,   | Michael Nottebrock   | [EMAIL PROTECTED]
 (/^ ^\) | FreeBSD - The Power to Serve | http://www.freebsd.org
   \u/   | K Desktop Environment on FreeBSD | http://freebsd.kde.org


pgpF1Rh0vZeFA.pgp
Description: PGP signature


Re: dangerous situation with shutdown process

2005-07-16 Thread Nicolas Rachinsky
* Matthias Buelow [EMAIL PROTECTED] [2005-07-16 01:42 +0200]:
 David Taylor [EMAIL PROTECTED] writes:
 
  A corrupted journal can be detected. If it's corrupted, discard
  the whole thing, or only the relevant entry. The filesystem will
  remain consistent.
  If track corruption occurs after the journal is written, it doesn't
  matter, since at boot the journal will be replayed and all operations
  will be performed once more.
 
 The track which is corrupted could contain data that wasn't written
 to in months.  How would the journal help?
 
 I don't understand this question.

The track destroyed could contain sectors which are in no way related
to the sectors the OS is writing to.

Nicolas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread David Taylor
On Sat, 16 Jul 2005, Matthias Buelow wrote:
 David Taylor [EMAIL PROTECTED] writes:
 
  A corrupted journal can be detected. If it's corrupted, discard
  the whole thing, or only the relevant entry. The filesystem will
  remain consistent.
  If track corruption occurs after the journal is written, it doesn't
  matter, since at boot the journal will be replayed and all operations
  will be performed once more.
 
 The track which is corrupted could contain data that wasn't written
 to in months.  How would the journal help?
 
 I don't understand this question.

When the drive is powered off, the track being written to at that point
may be corrupted, right?  That track may contain sectors that the OS
did't change.  These sectors would not be mentioned in the journal.
How would a journaling fs fix the corruption?

I suppose this could be avoided by requiring that all writes (and
journal entries) somehow correspond to a full track.  (Which I suppose
they may do already, but I don't think so).

 I still don't trust ATA drives.  Can you guarantee (or show any
 reason to believe) that disabling the write cache will actually
 wait for the cache to be flushed before returning?
 Otherwise a disable cacheenable cache sequence is exactly
 the same as a flush cache command.  If the drive executes
 both immediately, without waiting for the cache to be
 flushed _before_ returning, what's the difference?
 
 You imply that, because there exists one drive for which it doesn't
 work, that it follows that it won't work for all drives? Or what is your
 point?

No.  I'm just asking if you know of ANY ata drives that will wait for the
cache to be flushed before claiming the disable cache command has
succeeded.  I don't, but I haven't looked.

-- 
David Taylor
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread John-Mark Gurney
Matthias Buelow wrote this message on Fri, Jul 15, 2005 at 22:11 +0200:
 for write barriers in the block drivers had been implemented, we

phk removed support for write barriers because no one was making use
of them...  FreeBSD had them, and when there is *CODE* that makes use
of them, they'll be added back...

As the saying goes:
 Shut up and code!!!

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Matthias Buelow
Nicolas Rachinsky wrote:

 The track which is corrupted could contain data that wasn't written
 to in months.  How would the journal help?
 
 I don't understand this question.

The track destroyed could contain sectors which are in no way related
to the sectors the OS is writing to.

And in what way is that related to the existence or nonexistence
of write barriers and a journal?
If you pound the disk with a hammer, it will most likely break,
no matter what strategy you're using.
That you cannot eliminate _all_ sources of error with a strategy
doesn't mean that you shouldn't implement it to minimize the number
of errors that could happen.

Besides, I always thought that (most) disks had enough power reserve
to be able to write at least one track when power goes away? Or is
that an urban myth, I don't know for sure.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Bill Vermillion
Somewhere around Fri, Jul 15, 2005 at 22:13 , the world stopped
and listened as [EMAIL PROTECTED] graced us with
this profound tidbit of wisdom that would fulfill the enjoyment of
future generations:

 Date: Fri, 15 Jul 2005 18:22:14 +0200
 From: Matthias Buelow [EMAIL PROTECTED]
 Subject: Re: dangerous situation with shutdown process
 To: Bill Vermillion [EMAIL PROTECTED]
 Cc: freebsd-stable@freebsd.org
 Message-ID: [EMAIL PROTECTED]
 Content-Type: text/plain; charset=us-ascii

 Bill Vermillion wrote:

 Copying very large files and then shutting down I hope is not a
 normal procecure for you.   softupdates sometimes do take a long
 time when you are removing/copying very large files.

 Others have suggested different time-outs but you'd have to figure
 out the largest size you may every encounter and set things for
 that, which is not going to help for everyday operation.

 I've watched the amount of disk space increase slowly by performing
 'df' and it can take a long time - up to a minute on some extremely
 large partitions I was cleaning.

 One way to force everything to be written I've found [by
 observation only] is to perform an fsck on that file system.

 If you only do huge copies and immediate shutdowns rarely, then
 maybe it's just a good idea to remember how softupdates work, and
 then fsck, then shutdown.

 I'm always against changing default operations from typical 
 operations to extremes.

 Sorry folks, have I somehow dropped into a parallel universe,
 or is there some serious misunderstanding going on?

 To the OP: There is no sync process that is being killed by
 shutdown. The kernel writes out all dirty buffers as part of its
 shutdown procedure.

I was under the impression that there was a problem, that's why I
wrote my reply.

 Bill, as I get it from what you wrote, correct me if I'm wrong,
 you assume that:

  1. unmount doesn't wait for all dirty data being committed
 to disk before somehow removing the filesystem,

That's what the OP seemed to indicate.

  2. fsck on a live filesystem will somehow speed things up.

Actually an fsck on a live filesystem will force the softupdates to
complete more quickly - that is from observation - and when I've
deleted extremly large directories - usually /usr/src and /usr/obj.
It only speeds up flushing the blocks to disk.

 For 1., this is surely not the case, the same as with shutdown,
 the kernel of course writes (drive errors notwithstanding)
 all modified buffers and updates all on-disk structures before
 marking the fs clean, and

 for 2., you should never fsck a mounted filesystem. Besides,
 it is completely unnecessary.

You can fsck a mounted file system and fsck will run in read-only
mode.  That way you can check for problems, and if there is
something wrong you can shutdown and restart.  FreeBSD will NOT
run fsck in anything other than READ ONLY when the file system is
mounted

And in the old days when drives were smaller and slower and
perfomance needed to be maximized, from about Verision III through
System V you could run   fsck -S device from cron!!

The -S flag was interesting in that it would actually re-write
the freelist IF AND ONLY IF there was no corruption on the drive.

Since blocks on those systems were used in the revers order they were
released, running fsck -S sorted the freelist in ascending order
and thus helped to elminate fragmentation.  This was particularly
important on the S51 file systems - as it was before the SysV's
adoptedf variants of the FFS system that came from BSD.

 If the OP has encountered any data corruption, this is due to
 an unclean shutdown because of disk errors or a kernel bug,
 and not because of timeouts that are too short or something
 like that.

It would have been nice to see his actual errors.

Bill
-- 
Bill Vermillion - bv @ wjv . com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Nicolas Rachinsky
* Matthias Buelow [EMAIL PROTECTED] [2005-07-16 16:07 +0200]:
 Nicolas Rachinsky wrote:
 
  The track which is corrupted could contain data that wasn't written
  to in months.  How would the journal help?
  
  I don't understand this question.
 
 The track destroyed could contain sectors which are in no way related
 to the sectors the OS is writing to.
 
 And in what way is that related to the existence or nonexistence
 of write barriers and a journal?

You wrote before:
| If track corruption occurs after the journal is written, it doesn't
| matter, since at boot the journal will be replayed and all operations
| will be performed once more.

 If you pound the disk with a hammer, it will most likely break,
 no matter what strategy you're using.
 That you cannot eliminate _all_ sources of error with a strategy
 doesn't mean that you shouldn't implement it to minimize the number
 of errors that could happen.

I'm not argumenting for or against write barriesrs or a journal.

Nicolas
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Matthias Buelow
David Taylor wrote:

No.  I'm just asking if you know of ANY ata drives that will wait for the
cache to be flushed before claiming the disable cache command has
succeeded.  I don't, but I haven't looked.

I don't know either. I assume that they do. Does it matter?
I mean, I'm not suggesting a frivolous new theory that is highly
speculative and warrants a lengthy debate on its purported merits.
What I described is common practice on Windows, Linux and probably
a few other systems and I would think that they're not doing this
for nothing. And, frankly, I'm a bit astonished that the FreeBSD
(community) seems to be so ignorant of well-known measures for
improving data safety on consumer-grade desktop hardware. Does that
mean that FreeBSD is deemed generally unsuited for desktop and
laptop use and should be reserved for servers with the appropriate
(expensive) hardware? I hope not.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Matthias Buelow
Bill Vermillion wrote:

You can fsck a mounted file system and fsck will run in read-only
mode.  That way you can check for problems, and if there is
something wrong you can shutdown and restart.  FreeBSD will NOT
run fsck in anything other than READ ONLY when the file system is
mounted

I thought fsck on a live (read-write) filesystem almost always
brings up errors (although only of a certain kind, like dangling
inodes) unless the fs has been completely quiescent for a while.

A quick check seems to confirm this:

** /dev/ad4s3a (NO WRITE)
** Last Mounted on /
** Root file system
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
UNREF FILE I=94257  OWNER=mkb MODE=100600
SIZE=2397 MTIME=Jul 16 16:25 2005 
CLEAR? no

And in the old days when drives were smaller and slower and
perfomance needed to be maximized, from about Verision III through
System V you could run   fsck -S device from cron!!

The -S flag was interesting in that it would actually re-write
the freelist IF AND ONLY IF there was no corruption on the drive.

I'm amazed that this worked.. considering that the fsck would have
to be atomic then (i.e., basically halt all filesystem i/o while
it's running).

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Bill Vermillion
At Sat, Jul 16, 2005 at 16:29 , our malformed and occasionally 
flatulent friend Matthias Buelow spewed forth this fount of brain juice:

 Bill Vermillion wrote:

 You can fsck a mounted file system and fsck will run in read-only
 mode.  That way you can check for problems, and if there is
 something wrong you can shutdown and restart.  FreeBSD will NOT
 run fsck in anything other than READ ONLY when the file system is
 mounted

 I thought fsck on a live (read-write) filesystem almost always
 brings up errors (although only of a certain kind, like dangling
 inodes) unless the fs has been completely quiescent for a while.

 A quick check seems to confirm this:

 ** /dev/ad4s3a (NO WRITE)
 ** Last Mounted on /
 ** Root file system
 ** Phase 1 - Check Blocks and Sizes
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 UNREF FILE I=94257  OWNER=mkb MODE=100600
 SIZE=2397 MTIME=Jul 16 16:25 2005 
 CLEAR? no

The 'no' was supplied by the system, was it not.  First line
sas NO WRITE.

 And in the old days when drives were smaller and slower and
 perfomance needed to be maximized, from about Verision III through
 System V you could run   fsck -S device from cron!!

 The -S flag was interesting in that it would actually re-write
 the freelist IF AND ONLY IF there was no corruption on the drive.

 I'm amazed that this worked.. considering that the fsck would have
 to be atomic then (i.e., basically halt all filesystem i/o while
 it's running).

We'd run it from cron as noted.  And this was done overnight - in
systems where users were there only in the daytime.  It did make a
difference in keeping perfomance up longer than without it.

Without that you'd basically have to backup the fs, remake the fs,
and then reload to get back the originally installed performance.

But as I noted this was for the S51 file system.  It was really
slow.  On my first Sys V.3 system, I made one file system
with the old S51/Xenix layout, and everthing else was an FFS that
was slightly modified from the the original BSD systems.  That was
probably about 1990.

The performance on the S51 ON THE SAME DRIVE - was no better than
30% as fast as the FSS and most of the time it was only 10% as
fast.

Once all the old customers moved to newer OS versions the old
fsck -S [note that it is capital S and not 's' - and you'll have to
find a Sys V manual to document the differentce - and I don't
have one handy at the moment].

And small businesses were very very reluctant to upgrade unless
they were forced too.  I did some Y2K patching on OSes that had
been installed in the late 1980s.  And about the latest anyone
would be using a system there would be about 9PM - when the owner
stayed late.

With current systems, in particular net connected systems with 
email, you could not hope to find a quiescent system.  

However the S flag was rewrite the freelist ONLY if the rest of 
the fsck gave no errors.  If there were problems, such as the
unrefferenced file you showed in your example, the freelist would
not be re-written.  That's why it was OK to run it in cron.

Anyone who had not worked with Unix systems of 10-25 years ago can't
begin to appreciate how good things are today.

Bill
-- 
Bill Vermillion - bv @ wjv . com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread David Magda


On Jul 15, 2005, at 11:08, Bill Vermillion wrote:


If you only do huge copies and immediate shutdowns rarely, then
maybe it's just a good idea to remember how softupdates work, and
then fsck, then shutdown.


This may sound simplistic, but what about a triple sync(8)? (sync; 
sync; sync)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Bill Vermillion
I know you'll find this hard to believe, but on Sat, Jul 16, 2005 at 10:52 ,
David Magda actually admitted to saying:

 
 On Jul 15, 2005, at 11:08, Bill Vermillion wrote:

 If you only do huge copies and immediate shutdowns rarely, then
 maybe it's just a good idea to remember how softupdates work, and
 then fsck, then shutdown.

 This may sound simplistic, but what about a triple sync(8)? (sync; 
 sync; sync)

Actually I saw that documented a very very long time ago in
an Intel Unix manual.  And Intel got out of Unix in the mid to late
1980s.  I don't recall if that was the one that was sold to Kodak -
the picture people - which then was sold to Interactive ?? - and
eventually wound up at Sun.  There were so many Unix variants
in those days you had to have a chart to keep up with them.  Each
HW manufacturer had their own version and name, and at that time
the only time you could call your OS Unix was if you compiled
it directly from the ATT tapes with no changes on a Vax [if I
recall the scenario correctly].

But that was a long time ago.

Bill

-- 
Bill Vermillion - bv @ wjv . com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Paul Mather
On Sat, 2005-07-16 at 16:16 +0200, Matthias Buelow wrote:
 David Taylor wrote:
 
 No.  I'm just asking if you know of ANY ata drives that will wait for the
 cache to be flushed before claiming the disable cache command has
 succeeded.  I don't, but I haven't looked.
 
 I don't know either. I assume that they do. Does it matter?
 I mean, I'm not suggesting a frivolous new theory that is highly
 speculative and warrants a lengthy debate on its purported merits.
 What I described is common practice on Windows, Linux and probably
 a few other systems and I would think that they're not doing this
 for nothing. And, frankly, I'm a bit astonished that the FreeBSD
 (community) seems to be so ignorant of well-known measures for
 improving data safety on consumer-grade desktop hardware. Does that
 mean that FreeBSD is deemed generally unsuited for desktop and
 laptop use and should be reserved for servers with the appropriate
 (expensive) hardware? I hope not.

I recall reading on freebsd-current that Scott Long is working on adding
journalling support to FFS.  Perhaps you'd like to direct your input to
him instead of rehashing it repeatedly on here, which is the wrong
outlet for such discussion anyway: by definition, CURRENT, not STABLE
will get a new feature like journalling, and so discussing the ins and
outs of it on freebsd-current would seem more apropos.  (Plus, I'd hate
to see him implement something only for you to declare it to be the
wrong way to do things.  Better to get your 2p in now.:)

Reagrding your question of does that mean that FreeBSD is deemed
generally unsuited for desktop and laptop use, I can speak only from
experience.  I run 6-CURRENT on an ATA system using softupdates on my
desktop, and have been doing so for quite some time.  I've seen through
quite a few periods of extreme growing pains for CURRENT, with seemingly
random panics and mystery crashes at times.  (Who can forget the dark
days of the ULE + PREEMPTION instability?)  Add to the mix that my
neighbourhood has pretty flaky power, with a tendency for short
interruptions at the first whiff of bad weather.  This all adds up to a
good smattering of reboots at inopportune times (like when doing
buildworlds or large portupgrade sessions:).

Despite that, I have never EVER had a problem with data consistency on
my file systems.  (The only problem I have had is when I added an ATA
controller card one time and forgot to disable its RAID BIOS, which
promptly spammed over my geom_mirror metadata.:)  If softupdates were as
unsafe as you often hint, I'm surprised that I haven't lost a file
system by now.  (I would also expect to hear from the field a lot more
clamour about how unsafe it is, and that, in fact, the sky was indeed
falling.)  I guess I must be amazingly lucky and should start playing
the lottery right now. :-)

The main inconvenience I have with panics or outages is the fsck times
on reboot.  (Actually, what I find to be more inconvenient is the
resynchronisation time needed for my geom_mirror, which takes a lot
longer than a fsck.)  I understand that fsck delays for large file
systems is the major impetus behind the journalling work, not as a fix
for a perceived data consistency problem.

Cheers,

Paul.

PS: I also use softupdates on my NetBSD systems, again without problems.
I've also used LFS on NetBSD at times, but have always ended up
abandoning it due to performance and severe data reliability problems.
(To be clear, though, I'm not sure LFS was deemed to be for production
use, at least not the times I tried it.)
-- 
e-mail: [EMAIL PROTECTED]

Without music to decorate it, time is just a bunch of boring production
 deadlines or dates by which bills must be paid.
--- Frank Vincent Zappa
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread David Magda


On Jul 16, 2005, at 11:03, Bill Vermillion wrote:


Actually I saw that documented a very very long time ago in
an Intel Unix manual.


It's present in more recent documentation. :)

The sync utility can be called to ensure that all disk writes have 
been
 completed before the processor is halted in a way not suitably 
done by

 reboot(8) or halt(8).

http://www.freebsd.org/cgi/man.cgi?query=syncsektion=8

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Rick Kelly
Bill Vermillion said:

Actually I saw that documented a very very long time ago in
an Intel Unix manual.  And Intel got out of Unix in the mid to late
1980s.  I don't recall if that was the one that was sold to Kodak -
the picture people - which then was sold to Interactive ?? - and
eventually wound up at Sun.  There were so many Unix variants
in those days you had to have a chart to keep up with them.  Each
HW manufacturer had their own version and name, and at that time
the only time you could call your OS Unix was if you compiled
it directly from the ATT tapes with no changes on a Vax [if I
recall the scenario correctly].

The main reason for sync;sync;sync on V7 UNIX was because you couldn't 
do a shutdown, only a halt to the hardware monitor, on the PDP11. You
can verify that behavior with SIMH. :-)
-- 
Rick Kelly  [EMAIL PROTECTED]
http://www.rmkhome.com/
http://rkba.rmkhome.com/ - the right to keep and bear arms
http://wolf.rmkhome.com/ - firearm forums
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Lowell Gilbert
Paul Mather [EMAIL PROTECTED] writes:

 Despite that, I have never EVER had a problem with data consistency on
 my file systems.  (The only problem I have had is when I added an ATA
 controller card one time and forgot to disable its RAID BIOS, which
 promptly spammed over my geom_mirror metadata.:)  If softupdates were as
 unsafe as you often hint, I'm surprised that I haven't lost a file
 system by now.  (I would also expect to hear from the field a lot more
 clamour about how unsafe it is, and that, in fact, the sky was indeed
 falling.)  I guess I must be amazingly lucky and should start playing
 the lottery right now. :-)

Well, break it down a little bit.  If an ATA drive properly implements
the cache flush command, then none of the ongoing discussion is
relevant.  Herr Buelow is worried about drives that do *not* do so,
but which, when told to disable their cache, will reliably empty the
cache before continuing with further operations.  Such drives are the
only ones to which any of the discussion applies.

If such drives are reasonably common, then using such a hack would
make sense.  However, I would want some fairly solid evidence on the
matter before I was willing to start coding it, and so far the most
convincing evidence I have seen is that Microsoft engineers claim to
have done it.

Be well.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Matthias Buelow
Rick Kelly wrote:

The main reason for sync;sync;sync on V7 UNIX was because you couldn't 
do a shutdown, only a halt to the hardware monitor, on the PDP11. You
can verify that behavior with SIMH. :-)

Uhm.. that's the same on the VAX.. in what way would that preclude
a shutdown? NetBSD certainly shuts down on VAX (and drops into the
monitor when it's done.)

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Matthias Buelow
Paul Mather wrote:

on reboot.  (Actually, what I find to be more inconvenient is the
resynchronisation time needed for my geom_mirror, which takes a lot
longer than a fsck.)  I understand that fsck delays for large file
systems is the major impetus behind the journalling work, not as a fix
for a perceived data consistency problem.

Well... I have lost a few (ca. 3) UFS filesystems due to power loss
or a kernel crash in the past but interestingly those were all on
SCSI (and in the pre-softupdates era, so mounted with sync metadata
updates, where this Shouldn't Happen[tm] either..) I've also seen
ext2fs (which doesn't have safeguards against fs corruption) on
Linux zapped often by power loss and haven't seen a statistically
higher number of corrupted ext2fs than ufs.  So the whole thing is
a bit hard to quantify. However, I'm all for reducing the possibility
of corruption when it could be done, programmatically.

mkb.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Matthias Buelow
Lowell Gilbert wrote:

Well, break it down a little bit.  If an ATA drive properly implements
the cache flush command, then none of the ongoing discussion is

Are you sure this is the case? Are there sequence points in softupdates
where it issues a flush request and by this guarantees fs integrity?
I've read thru McKusick's paper in search for an answer but haven't
found any. All I've read so far on mailing lists and from googling
was that softupdates doesn't work if the wb-cache is enabled.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Jon Dama
No, it's at a level below softupdates that this must be done.  Softupdates
only understands when things have been marked completed with
biodone()--the underlying scsi/ata/sata driver must make the determination
as to when biodone should be called.

The flush has to be done there.  _IF_ the flush is being done there, then
request barriers represent a performance enhancement, not an integrity
enhancement.

-Jon

On Sat, 16 Jul 2005, Matthias Buelow wrote:

 Lowell Gilbert wrote:

 Well, break it down a little bit.  If an ATA drive properly implements
 the cache flush command, then none of the ongoing discussion is

 Are you sure this is the case? Are there sequence points in softupdates
 where it issues a flush request and by this guarantees fs integrity?
 I've read thru McKusick's paper in search for an answer but haven't
 found any. All I've read so far on mailing lists and from googling
 was that softupdates doesn't work if the wb-cache is enabled.

 mkb.
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-16 Thread Dave Horsfall
On Sat, 16 Jul 2005, Rick Kelly wrote:

 The main reason for sync;sync;sync on V7 UNIX was because you couldn't 
 do a shutdown, only a halt to the hardware monitor, on the PDP11. You
 can verify that behavior with SIMH. :-)

And you weren't supposed to use sync;sync;sync but this:

# sync
# sync
# sync

They were *not* the same.

-- Dave
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


reducing shutdown time (was: Re: dangerous situation with shutdown process)

2005-07-15 Thread Marc Santhoff
Am Freitag, den 15.07.2005, 11:15 +0600 schrieb Sergey N. Voronkov:
 On Thu, Jul 14, 2005 at 04:17:06PM -0400, asym wrote:
  At 15:19 7/14/2005, Wilko Bulte wrote:
  On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote..
Date: Thu, 14 Jul 2005 20:38:15 +0200
From: Anatoliy Dmytriyev [EMAIL PROTECTED]
Sender: [EMAIL PROTECTED]
[...]
 If you can't increase shutdown timeout, decrease softupdates timers.
 
 # tail -3 /etc/sysctl.conf
 kern.metadelay=14
 kern.dirdelay=15
 kern.filedelay=17
 
 That was my solution for shutdown wait timeout.

Intersting, I didn't know these knobs. Would it be okay to set them to
zero on an embedded system with r/o file systems?

And are there other variables tunable for reducing shutdown time (on
4-STABLE)?

TIA,
Marc


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Bill Vermillion
On Thu, Jul 14, 2005 at 22:31 , [EMAIL PROTECTED]
moved his mouse, rebooted for the change to take effect, and then
said:


 Message: 13
 Date: Thu, 14 Jul 2005 20:38:15 +0200
 From: Anatoliy Dmytriyev [EMAIL PROTECTED]
 Subject: dangerous situation with shutdown process
 To: freebsd-stable@freebsd.org, freebsd-questions@freebsd.org
 Message-ID: [EMAIL PROTECTED]
 Content-Type: text/plain; charset=UTF-8; format=flowed

 Hello, everybody!

 I have found unusual and dangerous situation with shutdown process:
 I did a copy of 200 GB data on the 870 GB partition (softupdates is 
 enabled) by cp command.

 It took a lot of time when I did umount for this partition
 exactly after cp, but procedure finished correctly.

 In case, if I did ???shutdown ???h(r)???, also exactly after cp,
 the shutdown procedure waited for ???sync??? (umounting of the
 file system) but sync process was terminated by timeout, and
 fsck checked and did correction of the file system after boot.

 System 5.4-stable, RAM 4GB, processor P-IV 3GHz.

 How can I fix it on my system?

Copying very large files and then shutting down I hope is not a
normal procecure for you.   softupdates sometimes do take a long
time when you are removing/copying very large files.

Others have suggested different time-outs but you'd have to figure
out the largest size you may every encounter and set things for
that, which is not going to help for everyday operation.

I've watched the amount of disk space increase slowly by performing
'df' and it can take a long time - up to a minute on some extremely
large partitions I was cleaning.

One way to force everything to be written I've found [by
observation only] is to perform an fsck on that file system.

If you only do huge copies and immediate shutdowns rarely, then
maybe it's just a good idea to remember how softupdates work, and
then fsck, then shutdown.

I'm always against changing default operations from typical 
operations to extremes.

Bill
-- 
Bill Vermillion - bv @ wjv . com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread John-Mark Gurney
Matthias Buelow wrote this message on Thu, Jul 14, 2005 at 21:52 +0200:
 The problem is that disks lie about whether they have actually written
 data. If the power goes off before the data is in cache, it's lost.
 
 No, the problem is that FreeBSD doesn't implement request barriers
 and that softupdates is flawed by design and seemingly could not
 make use of them, even if they were available (because, as I
 understand it, it relies on a total ordering of all writes, unlike
 the partial ordering necessary for a journalled fs).

even request barries will not save the fs in a power loss if the track
that is getting flushed durning a power loss...  Some other FreeBSD
folk has a reproducable case of where blocks that were not written to
on ATA hardware got trashed after a power loss...

With non-written to sectors getting trashed with the cache enabled,
barriers don't mean squat...

-- 
  John-Mark Gurney  Voice: +1 415 225 5579

 All that I will do, has been done, All that I have, has not.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
John-Mark Gurney wrote:

With non-written to sectors getting trashed with the cache enabled,
barriers don't mean squat...

Of course if you pound the disk with a hammer, then barriers also
won't help. Just because with a few disks perhaps it won't work at
all doesn't mean that one shouldn't at least try and get it working
for perhaps the 90% where it would work in order to reduce the
possibility of corruption by as much as possible. I mean, anything
is better than the current situation where apparently nothing is
done at all.

Why am I arguing in an uphill battle here? Is data safety no longer
important to the FreeBSD community? Such issues should not even
have to be discussed at all!

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
John-Mark Gurney wrote:

even request barries will not save the fs in a power loss if the track
that is getting flushed durning a power loss...  Some other FreeBSD
folk has a reproducable case of where blocks that were not written to
on ATA hardware got trashed after a power loss...
With non-written to sectors getting trashed with the cache enabled,
barriers don't mean squat...

One more thought.. they _do_ protect against power loss during writing
a track -- when used in combination with a journalled fs.

A corrupted journal can be detected. If it's corrupted, discard
the whole thing, or only the relevant entry. The filesystem will
remain consistent.
If track corruption occurs after the journal is written, it doesn't
matter, since at boot the journal will be replayed and all operations
will be performed once more.

The combination barriers+journal really seems to be very resilient
to filesystem corruption. When it's implemented without errors, and
the hardware doesn't do things like change bits randomly, I can't
think of a way this scheme can be corrupted at all.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Chuck Swiger

Matthias Buelow wrote:

John-Mark Gurney wrote:

[ ... ]

Why am I arguing in an uphill battle here? Is data safety no longer
important to the FreeBSD community? Such issues should not even
have to be discussed at all!


You ask a good question: so, just why are *you* arguing? [1]

If sysctl hw.ata.wc=0 doesn't do what you want, please submit a PR containing 
something better.  Or buy SCSI hardware and a real, battery-backed up RAID 
system, or fibre-channel, or Firewire, or whatever floats your boat.


--
-Chuck

[1]: After all, generally it takes at least two people to argue, although some 
people manage to argue even with themselves. :-)


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
Chuck Swiger wrote:

If sysctl hw.ata.wc=0 doesn't do what you want, please submit a PR 
containing something better.  Or buy SCSI hardware and a real, 

Well, if I had the time, I would. Also if instead of softupdates,
a proper journalled filesystem implementation with kernel support
for write barriers in the block drivers had been implemented, we
wouldn't have this problem now. Ok, no point in arguing how things
would be if one had made different decisions in the past.

battery-backed up RAID system, or fibre-channel, or Firewire, or whatever 
floats your boat.

I would think that a significant part of (FreeBSD-) users are running
FreeBSD on desktop PCs, notebooks, etc., where a fibre-channel or
SCSI solution isn't really feasible (either technically, or
economically).

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Wilko Bulte
On Fri, Jul 15, 2005 at 10:11:02PM +0200, Matthias Buelow wrote..
 Chuck Swiger wrote:
 
 If sysctl hw.ata.wc=0 doesn't do what you want, please submit a PR 
 containing something better.  Or buy SCSI hardware and a real, 
 
 Well, if I had the time, I would. Also if instead of softupdates,
 a proper journalled filesystem implementation with kernel support
 for write barriers in the block drivers had been implemented, we
 wouldn't have this problem now. Ok, no point in arguing how things

sigh Not If The Bloody PeeCee Style Crap ATA Drives Keep Lying To You..

Followups to /dev/null

-- 
Wilko Bulte [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Lowell Gilbert
Matthias Buelow [EMAIL PROTECTED] writes:

 The combination barriers+journal really seems to be very resilient
 to filesystem corruption. When it's implemented without errors, and
 the hardware doesn't do things like change bits randomly, I can't
 think of a way this scheme can be corrupted at all.

We keep trying to point out that barriers *can't* be enforced on the
hardware with many (most, and apparently an increasing percentage of)
ATA drives.  There is no semantic on these drives that allows you to
guarantee the journal block will be written before the corresponding
data block.  If you are sure that your drives do this properly, then
you are safe, but in that case there's no reliability problem with
softupdates, either.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
Wilko Bulte wrote:

sigh Not If The Bloody PeeCee Style Crap ATA Drives Keep Lying To You..
Followups to /dev/null

Yes, makes no sense talking to a wall.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Kevin Oberman
 Date: Fri, 15 Jul 2005 22:24:07 +0200
 From: Matthias Buelow [EMAIL PROTECTED]
 Sender: [EMAIL PROTECTED]
 
 Wilko Bulte wrote:
 
 sigh Not If The Bloody PeeCee Style Crap ATA Drives Keep Lying To You..
 Followups to /dev/null
 
 Yes, makes no sense talking to a wall.

You are right, but I don't think you get who the wall is...

When you try to get an ATA drive to flush its buffers and tell you when
they are flushed, there is a hight probability that the drive (if it
support the function at all) will tell you that it has flushed the cache
immediately. 

There is simply no way to tell if your data or metadata is actually on
the magnetic medium and no technique (journaling, barriers, soft
updates) can assure that you will not have a corrupt disk, especially if
the write cache is near full. Think about how long it takes to flush a
16 MB buffer to the hard drive and remember that the dump of the cache
to the drive is in an order over which you have no control.

The ONLY way to be really safe is to turn off the write cache and that
extracts a huge performance penalty. What you prefer is a matter of
personal choice but the file system simply can't make things
better. 

I believe that the Windows solution to this problem is to put a really,
really long delay between when the system is finished syncing and when
the power is turned off. This might be the best solution for FreeBSD, as
well, but it will irritate people.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Jon Dama


On Fri, 15 Jul 2005, Matthias Buelow wrote:
 Why am I arguing in an uphill battle here? Is data safety no longer
 important to the FreeBSD community? Such issues should not even
 have to be discussed at all!

I'm trying to tell you what you have to say to move forward on this issue:

1) tell people that they are mistaken about drives ignoring the FUA bit or
   flush cache
2) convince people that the performance benefit of request barriers is
   worth it

I think we all care, but when we actually care--when money depends on it--
we adopt other measures scsi, batter backed raid, etc.

-Jon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
Kevin Oberman [EMAIL PROTECTED] writes:

I believe that the Windows solution to this problem is to put a really,
really long delay between when the system is finished syncing and when
the power is turned off. This might be the best solution for FreeBSD, as
well, but it will irritate people.

The Windows solution is, apparently, to disable and immediately
re-enable the writeback-cache around a barrier. This will ensure the
cache being written out to the platters, even if the drive ignores a
flush command.

Of course I don't know this for certain but have to rely on observations
that others have made. See, for example:

http://mail-index.netbsd.org/tech-kern/2002/12/09/0052.html

The long delay at shutdown would simply be a final safeguard in case
the drive also ignores disabling of the WC.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
Lowell Gilbert [EMAIL PROTECTED] writes:

We keep trying to point out that barriers *can't* be enforced on the
hardware with many (most, and apparently an increasing percentage of)
ATA drives.  There is no semantic on these drives that allows you to
guarantee the journal block will be written before the corresponding
data block.  If you are sure that your drives do this properly, then
you are safe, but in that case there's no reliability problem with
softupdates, either.

See my other mail(s) about other systems using cache disabling/enabling
to make up for a drive that ignores (or does not implement) a flush
command.

Then the advice of disable the wb-cache on disks to ensure data safety
doesn't make sense:

Either

 * the drive supports disabling the write-back-cache,
   then this method can be used to flush data to the platters,

or else

 * the drive does not support disabling the write-back-cache, or lies
   about it, then the advice to disable the write-back-cache for
   softupdates is meaningless.

I know my drive allows disabling of the write cache, as, apparently, the
majority of IDE/SATA drives do.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Jon Dama

 I know my drive allows disabling of the write cache, as, apparently, the
 majority of IDE/SATA drives do.
Yes fair enough.  This command is in the specification as far back as
ata-1.  I guess it yields reasonable? performance?

You should, however, be telling sos@ this--if he doesn't already believe
it.

-Jon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread David Taylor
On Fri, 15 Jul 2005, Matthias Buelow wrote:

 John-Mark Gurney wrote:
 
 even request barries will not save the fs in a power loss if the track
 that is getting flushed durning a power loss...  Some other FreeBSD
 folk has a reproducable case of where blocks that were not written to
 on ATA hardware got trashed after a power loss...
 With non-written to sectors getting trashed with the cache enabled,
 barriers don't mean squat...
 
 One more thought.. they _do_ protect against power loss during writing
 a track -- when used in combination with a journalled fs.
 
 A corrupted journal can be detected. If it's corrupted, discard
 the whole thing, or only the relevant entry. The filesystem will
 remain consistent.
 If track corruption occurs after the journal is written, it doesn't
 matter, since at boot the journal will be replayed and all operations
 will be performed once more.

The track which is corrupted could contain data that wasn't written
to in months.  How would the journal help?
 
 The combination barriers+journal really seems to be very resilient
 to filesystem corruption. When it's implemented without errors, and
 the hardware doesn't do things like change bits randomly, I can't
 think of a way this scheme can be corrupted at all.

I still don't trust ATA drives.  Can you guarantee (or show any
reason to believe) that disabling the write cache will actually
wait for the cache to be flushed before returning?

Otherwise a disable cacheenable cache sequence is exactly
the same as a flush cache command.  If the drive executes
both immediately, without waiting for the cache to be
flushed _before_ returning, what's the difference?

-- 
David Taylor 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-15 Thread Matthias Buelow
David Taylor [EMAIL PROTECTED] writes:

 A corrupted journal can be detected. If it's corrupted, discard
 the whole thing, or only the relevant entry. The filesystem will
 remain consistent.
 If track corruption occurs after the journal is written, it doesn't
 matter, since at boot the journal will be replayed and all operations
 will be performed once more.

The track which is corrupted could contain data that wasn't written
to in months.  How would the journal help?

I don't understand this question.

I still don't trust ATA drives.  Can you guarantee (or show any
reason to believe) that disabling the write cache will actually
wait for the cache to be flushed before returning?
Otherwise a disable cacheenable cache sequence is exactly
the same as a flush cache command.  If the drive executes
both immediately, without waiting for the cache to be
flushed _before_ returning, what's the difference?

You imply that, because there exists one drive for which it doesn't
work, that it follows that it won't work for all drives? Or what is your
point?

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Kevin Oberman
 Date: Thu, 14 Jul 2005 20:38:15 +0200
 From: Anatoliy Dmytriyev [EMAIL PROTECTED]
 Sender: [EMAIL PROTECTED]
 
 Hello, everybody!
 
 I have found unusual and dangerous situation with shutdown process:
 I did a copy of 200 GB data on the 870 GB partition (softupdates is 
 enabled) by cp command.
 It took a lot of time when I did umount for this partition exactly after 
 cp, but procedure finished correctly.
 In case, if I did “shutdown –h(r)”, also exactly after cp, the shutdown 
 procedure waited for “sync” (umounting of the file system) but sync 
 process was terminated by  timeout, and fsck checked and did correction 
 of the file system after boot.
 
 System 5.4-stable, RAM 4GB, processor P-IV 3GHz.
 
 How can I fix it on my system?

SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
the sysctl.

The problem is that disks lie about whether they have actually written
data. If the power goes off before the data is in cache, it's lost.

I am not sure if write-cache can be turned off on SCSI, but SCSI drives
seem less likely to lie about when the data is actually flushed to the
drive. 
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: [EMAIL PROTECTED]   Phone: +1 510 486-8634
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Wilko Bulte
On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote..
  Date: Thu, 14 Jul 2005 20:38:15 +0200
  From: Anatoliy Dmytriyev [EMAIL PROTECTED]
  Sender: [EMAIL PROTECTED]
  
  Hello, everybody!
  
  I have found unusual and dangerous situation with shutdown process:
  I did a copy of 200 GB data on the 870 GB partition (softupdates is 
  enabled) by cp command.
  It took a lot of time when I did umount for this partition exactly after 
  cp, but procedure finished correctly.
  In case, if I did “shutdown –h(r)”, also exactly after cp, the 
  shutdown 
  procedure waited for “sync” (umounting of the file system) but sync 
  process was terminated by  timeout, and fsck checked and did correction 
  of the file system after boot.
  
  System 5.4-stable, RAM 4GB, processor P-IV 3GHz.
  
  How can I fix it on my system?
 
 SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
 the sysctl.
 
 The problem is that disks lie about whether they have actually written
 data. If the power goes off before the data is in cache, it's lost.
 
 I am not sure if write-cache can be turned off on SCSI, but SCSI drives
 seem less likely to lie about when the data is actually flushed to the
 drive. 

At least you can set FUA if you want to force the data onto the platter.

-- 
Wilko Bulte [EMAIL PROTECTED]
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Anatoliy Dmytriyev

Kevin Oberman wrote:


SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
the sysctl.

The problem is that disks lie about whether they have actually written
data. If the power goes off before the data is in cache, it's lost.

I am not sure if write-cache can be turned off on SCSI, but SCSI drives
seem less likely to lie about when the data is actually flushed to the
drive. 
 



SCSI, Adaptec 2110S

--
Anatoliy Dmytriyev
[EMAIL PROTECTED]


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Matthias Buelow
Kevin Oberman wrote:

 How can I fix it on my system?

SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
the sysctl.

You do NOT want to do that. Not only will performance drop brutally
(example: drop to 1/5th of normal write speed for sequential writes,
probably worse for random writes) but it will also significantly
reduce the lifetime of your disk. Modern disks are designed to be
used with the write-back cache enabled, so don't turn it off.

The problem is that disks lie about whether they have actually written
data. If the power goes off before the data is in cache, it's lost.

No, the problem is that FreeBSD doesn't implement request barriers
and that softupdates is flawed by design and seemingly could not
make use of them, even if they were available (because, as I
understand it, it relies on a total ordering of all writes, unlike
the partial ordering necessary for a journalled fs).

Until a journalled fs that uses write request barriers is available
for FreeBSD, you better had a reliable UPS.

mkb.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread David Sze
On Thu, Jul 14, 2005 at 09:52:53PM +0200, Matthias Buelow wrote:
 Kevin Oberman wrote:
 
 The problem is that disks lie about whether they have actually written
 data. If the power goes off before the data is in cache, it's lost.
 
 No, the problem is that FreeBSD doesn't implement request barriers
 and that softupdates is flawed by design and seemingly could not
 make use of them, even if they were available (because, as I
 understand it, it relies on a total ordering of all writes, unlike
 the partial ordering necessary for a journalled fs).
 
 Until a journalled fs that uses write request barriers is available
 for FreeBSD, you better had a reliable UPS.

How do OS-level request barriers help if the disk reorders pending
writes in its cache?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Jon Dama

softupdates is perfectly safe with SCSI.

its well known that ide and sata w/wo ncq fails to provide suitable
semantics for softupdates

however, journaling fairs no better, and request barriers do nothing to
solve the problem.

Request Barriers under linux exist to prevent the low level kernel block
device layer from reordering write operations from the upper file system
layers.  Request Barriers consist of nothing more than tagging internal
queues within the Linux kernel itself.  They do nothing to resolve the
underlying failures of the hardware to provide proper semantics to the
block device layer.

but, Request Barriers are ultimately useless.  They can't resolve the
underlying problems with ide/sata and there are already exposed semantics
for scsi.

if you absolutely must use sata and have reliable writes, make use of sata
with battery-backed raid controller.


On Thu, 14 Jul 2005, Matthias Buelow wrote:

 Kevin Oberman wrote:

  How can I fix it on my system?
 
 SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
 the sysctl.

 You do NOT want to do that. Not only will performance drop brutally
 (example: drop to 1/5th of normal write speed for sequential writes,
 probably worse for random writes) but it will also significantly
 reduce the lifetime of your disk. Modern disks are designed to be
 used with the write-back cache enabled, so don't turn it off.

 The problem is that disks lie about whether they have actually written
 data. If the power goes off before the data is in cache, it's lost.

 No, the problem is that FreeBSD doesn't implement request barriers
 and that softupdates is flawed by design and seemingly could not
 make use of them, even if they were available (because, as I
 understand it, it relies on a total ordering of all writes, unlike
 the partial ordering necessary for a journalled fs).

 Until a journalled fs that uses write request barriers is available
 for FreeBSD, you better had a reliable UPS.

 mkb.

 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread asym

At 15:19 7/14/2005, Wilko Bulte wrote:

On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote..
  Date: Thu, 14 Jul 2005 20:38:15 +0200
  From: Anatoliy Dmytriyev [EMAIL PROTECTED]
  Sender: [EMAIL PROTECTED]
 
  Hello, everybody!
 
  I have found unusual and dangerous situation with shutdown process:
  I did a copy of 200 GB data on the 870 GB partition (softupdates is
  enabled) by cp command.
  It took a lot of time when I did umount for this partition exactly after
  cp, but procedure finished correctly.
  In case, if I did “shutdown –h(r)”, also exactly after cp, the 
shutdown

  procedure waited for “sync” (umounting of the file system) but sync
  process was terminated by  timeout, and fsck checked and did correction
  of the file system after boot.
 
  System 5.4-stable, RAM 4GB, processor P-IV 3GHz.
 
  How can I fix it on my system?


The funny thing about all the replies here.. is that this guy is not saying 
that sync doesn't work.


He's saying that the timeout built into shutdown causes it to *terminate* 
the sync forcibly before it's done, and then reboot.


All finger pointing about IDE, SCSI, softupdates, and journals aside.. I 
think all he wants/needs is a way to increase that timer.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Matthias Buelow
David Sze wrote:

 Until a journalled fs that uses write request barriers is available
 for FreeBSD, you better had a reliable UPS.

How do OS-level request barriers help if the disk reorders pending
writes in its cache?

By separating journal updates from the corresponding metadata (and/or
data) actions, and by guaranteeing (by flushing the cache, or a
singular disabling/enabling of the wb cache at the barrier) that
the journal is updated on disk before the actions take place. This
imposes an ordering on the journal vs. action requests, which is
what a journalled fs needs for filesystem integrity. It doesn't
really matter if the disk reorders writes within those two blocks,
the only thing that really matters is that the journal update is
completed before metadata (or data) updates take place. With
softupdates, as far as I understand, that doesn't work, because
there is no journal.  All requests must be in the order that
softupdates decrees. You'd have to issue a barrier request after
every write request, which would be equivalent to disabling the wb
cache.

mkb.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Matthias Buelow
Jon Dama wrote:

Request Barriers under linux exist to prevent the low level kernel block
device layer from reordering write operations from the upper file system
layers.  Request Barriers consist of nothing more than tagging internal
queues within the Linux kernel itself.  They do nothing to resolve the
underlying failures of the hardware to provide proper semantics to the
block device layer.
but, Request Barriers are ultimately useless.  They can't resolve the
underlying problems with ide/sata and there are already exposed semantics
for scsi.

If you flush the cache at barriers, on-disk integrity of the journal
vs. metadata updates is guaranteed.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Jon Dama

if the FUA bit in the sata command header is properly respected.

if the flush cache command on an ata device is properly respected.

if the flush cache command on an ata device is implemented (it's optional)

if the flush cache command exists when the ata device was made (it isn't
in the earlier versions of the ata spec).

anyways, your comments about softupdates needing total ordering versus
journals needing partial ordering are wrong.  softupdates only requires
that you do not call 'biodone(x)' until 'x' has been committed to disk.
this is 100% compatiable with the specification feature set, IF those
semantics are actually present in the hardware.


please see the thread beginning with the following commit message for an
extensive discussion of these topics:

http://lists.freebsd.org/pipermail/cvs-src/2003-April/001002.html

-Jon

On Thu, 14 Jul 2005, Matthias Buelow wrote:

 Jon Dama wrote:

 Request Barriers under linux exist to prevent the low level kernel block
 device layer from reordering write operations from the upper file system
 layers.  Request Barriers consist of nothing more than tagging internal
 queues within the Linux kernel itself.  They do nothing to resolve the
 underlying failures of the hardware to provide proper semantics to the
 block device layer.
 but, Request Barriers are ultimately useless.  They can't resolve the
 underlying problems with ide/sata and there are already exposed semantics
 for scsi.

 If you flush the cache at barriers, on-disk integrity of the journal
 vs. metadata updates is guaranteed.

 mkb.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Matthias Buelow
Jon Dama [EMAIL PROTECTED] writes:

if the FUA bit in the sata command header is properly respected.
if the flush cache command on an ata device is properly respected.
if the flush cache command on an ata device is implemented (it's optional)
if the flush cache command exists when the ata device was made (it isn't
in the earlier versions of the ata spec).

or if the write-back cache can be disabled and re-enabled.

anyways, your comments about softupdates needing total ordering versus
journals needing partial ordering are wrong.  softupdates only requires
that you do not call 'biodone(x)' until 'x' has been committed to disk.

Well. Can it group writes in such a way that flushing would be required
only at larger intervals, or can't it?

this is 100% compatiable with the specification feature set, IF those
semantics are actually present in the hardware.

Apparently it is not compatible with the real-world feature set and it
should've been clear to the designer(s) of softupdates that write-back
caches signal completion while the data is still in the cache. That's
the whole purpose of these mechanisms (so they can delay and reorder the
writes and write out whole tracks). You should only assume that, in that
case, a seperate flush command (or a workaround that amounts to a flush)
exists. Any different design assumes an oversimplified black box notion
of a drive that does not correspond with reality.

please see the thread beginning with the following commit message for an
extensive discussion of these topics:
http://lists.freebsd.org/pipermail/cvs-src/2003-April/001002.html

I've seen nothing that contradicts what I've said.

The point is, that the request barrier design with flushing at barriers
as used in M$ Windows (and also completed in recent Linux kernels)
allows safe use of disks with write-back cache enabled, while FreeBSD
with softupdates apparently doesn't. I don't really care how it's
implemented, or if journalling is used, or softupdates, or a
quantum-tachyon-reverser mounted on the front antenna. I just want to
have the same level of data safety on my hardware with FreeBSD that I
would get with other systems.

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Lowell Gilbert
Jon Dama [EMAIL PROTECTED] writes:

 softupdates is perfectly safe with SCSI.
 
 its well known that ide and sata w/wo ncq fails to provide suitable
 semantics for softupdates
 
 however, journaling fairs no better, and request barriers do nothing to
 solve the problem.

I had assumed that the sequence of operations in a journal would be
idempotent.  Is that a reasonable design criterion?  [If it is, then
it would make up for the fact that you can't build a reliable
transaction gate.  That is, you would just have to go back far enough
that you *know* all of the needed journal is within the range you will
replay.  But even then, the journal would need to be on a separate
medium, one that doesn't have the lying to you about transaction
completion problem.]

 On Thu, 14 Jul 2005, Matthias Buelow wrote:
 
  Kevin Oberman wrote:
 
  SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or
  the sysctl.
 
  You do NOT want to do that. Not only will performance drop brutally
  (example: drop to 1/5th of normal write speed for sequential writes,
  probably worse for random writes) but it will also significantly
  reduce the lifetime of your disk. Modern disks are designed to be
  used with the write-back cache enabled, so don't turn it off.

I have no idea what designed to be used with the write-back cache
enabled could affect the operating life of the disk.  
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Matthias Buelow
Lowell Gilbert [EMAIL PROTECTED] writes:

Jon Dama [EMAIL PROTECTED] writes:
 however, journaling fairs no better, and request barriers do nothing to
 solve the problem.

I had assumed that the sequence of operations in a journal would be
idempotent.  Is that a reasonable design criterion?  [If it is, then
it would make up for the fact that you can't build a reliable
transaction gate.  That is, you would just have to go back far enough
that you *know* all of the needed journal is within the range you will
replay.  But even then, the journal would need to be on a separate
medium, one that doesn't have the lying to you about transaction
completion problem.]

No, it needn't. It is sufficient that the journal entries for a block of
updates that are to follow are on disk before the updates are made.
That's all. This can be achieved by inserting a write barrier request in
between the journal writes and the actual data/metadata writes. The
block driver will, when it sees the barrier, a) write out all requests
in its queue that it got before the barrier, and b) flush the cache so
that they will not get intermixed by the drive with the following data
writes.

What could happen now when the power goes away at an inopportune moment?
[Note that I'm only talking about filesystem integrity, not general
data loss.]

* If power goes away before the journal is written, nothing happens.
* If the journal is partially written, and power goes away, it will
  be partially replayed at boot but the filesystem will be consistent.
* If power goes away, when the journal is fully written, but no
  metadata updates have been performed, they will be performed at
  boot and everything is as if the full request has completed before
  power went out.
* If power goes away when the journal is fully written, and parts of
  the metadata updates have been written, those updates will be performed
  twice (once more at reboot) but that won't matter since these operations
  are idempotent. The remaining metadata updates are then performed
  once, at reboot.

So where is the need for the journal to be on a seperate medium?
The only thing that matters is that no metadata updates will be written
before the journal has been written, and flushing the disk cache at a
barrier will ensure this. Note that the disk doesn't even have to flush
the cache when it receives that command, it only has to ensure that
it'll perform all requests before the flush in front of those that come
afterwards.

I have no idea what designed to be used with the write-back cache
enabled could affect the operating life of the disk.  

If you disable the write cache, you get a much higher weartear due
to much more seeking.
If I observe a 5x performance degradation when the cache is disabled,
for sequential writes (i.e., no cache overwriting effects), I would
think that I also have a factor 1 of increased seeking operations in
the drive, otherwise the performance degradation cannot be explained.
[Besides, the disk gets really loud when the cache is disabled.]

mkb.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: dangerous situation with shutdown process

2005-07-14 Thread Sergey N. Voronkov
On Thu, Jul 14, 2005 at 04:17:06PM -0400, asym wrote:
 At 15:19 7/14/2005, Wilko Bulte wrote:
 On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote..
   Date: Thu, 14 Jul 2005 20:38:15 +0200
   From: Anatoliy Dmytriyev [EMAIL PROTECTED]
   Sender: [EMAIL PROTECTED]
  
   Hello, everybody!
  
   I have found unusual and dangerous situation with shutdown process:
   I did a copy of 200 GB data on the 870 GB partition (softupdates is
   enabled) by cp command.
   It took a lot of time when I did umount for this partition exactly 
 after
   cp, but procedure finished correctly.
   In case, if I did ???shutdown ???h(r)???, also exactly after cp, the 
 shutdown
   procedure waited for ???sync??? (umounting of the file system) but sync
   process was terminated by  timeout, and fsck checked and did correction
   of the file system after boot.
  
   System 5.4-stable, RAM 4GB, processor P-IV 3GHz.
  
   How can I fix it on my system?
 
 The funny thing about all the replies here.. is that this guy is not saying 
 that sync doesn't work.
 
 He's saying that the timeout built into shutdown causes it to *terminate* 
 the sync forcibly before it's done, and then reboot.
 
 All finger pointing about IDE, SCSI, softupdates, and journals aside.. I 
 think all he wants/needs is a way to increase that timer.
 

If you can't increase shutdown timeout, decrease softupdates timers.

# tail -3 /etc/sysctl.conf
kern.metadelay=14
kern.dirdelay=15
kern.filedelay=17

That was my solution for shutdown wait timeout.

Serg N. Voronkov,
Sibitex JSC,
Tyumen, Russia.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]