Re: dangerous situation with shutdown process
On 18 Jul, Matthias Buelow wrote: Paul Mather [EMAIL PROTECTED] writes: Why would that necessarily be more successful? If the outstanding buffers count is not reducing between time intervals, it is most likely because there is some underlying hardware problem (e.g., a bad block). If the count still persists in staying put, it likely means whatever the hardware is doing to try and fix things (e.g., write reallocation) isn't working, and so the kernel may as well give up. So the kernel is relying on guesswork whether the buffers are flushed or not... You can enumerate the buffers and *try* to write them, but that doesn't guarantee they will be written successfully any more than observing the relative number left outstanding. That's rather nonsensical. If I write each buffer synchronously (and wait for the disk's response) this is for sure a lot more reliable than observing changes in the number of remaining buffers. I mean, where's the sense in the latter? It would be analogous to, in userspace, having to monitor write(2) continuously over a given time interval and check whether the number it returns eventually reaches zero. That's complete madness, imho. During syncer shutdown, the numbers being printed are actually the number of vnodes that have dirty buffers. The syncer walks the list of vnodes with dirty buffers any synchronously flushes each one to disk (modulo whatever write-caching is done by the drive). The reason that the monitors the number of dirty vnodes instead of just interating once over the list is that with softupdates, flushing one vnode to disk can cause another vnode to be dirtied and put on the list, so it can take multiple passes to flush all the dirty vnodes. Its normal to see this if the machine was at least moderately busy before being shut down. The number of dirty vnodes will start off at a high number, decrease rapidly at first, and then decrease to zero. It is not unusual to see the number bounce from zero back into the low single digits a few times before stabilizing at zero and triggering the syncer termination code. The syncer shutdown algorithm could definitely be improved to speed it up. I didn't want it to push out too many vnodes at the start of the shutdown sequence, but later in the sequence the delay intervals could be shortened and more worklist buckets could be visited per interval to speed the shutdown. One possible complication that I worry about is that the new vnodes being added to the list might not be added synchronously, so if the syncer processes the worklist and shuts down too quickly it might miss vnodes that got added too late. I've never seen a syncer shutdown timeout, though it could happen if either the underlying media became unwriteable or if a process got wedged while holding a vnode lock. In either case, it might never be possible to flush the dirty vnodes in question. The final sync code in boot() just iterated over the dirty buffers, but it was not unusual for it to get stuck on mutually dependent buffers. I would see this quite frequently if I did a shutdown immediately after running mergemaster. The final sync code would flush all but the last few buffers and finally time out. This problem was my motivation for adding the shutdown code to the syncer so that the final sync code would hopefully not have anything to do. The final sync code also gets confused if you have any ext2 file systems mounted (even read-only) and times out while waiting for the ext2 file system to release its private buffers (which only happens when the file system is unmounted). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On 16 Jul, David Taylor wrote: On Sat, 16 Jul 2005, Matthias Buelow wrote: David Taylor [EMAIL PROTECTED] writes: A corrupted journal can be detected. If it's corrupted, discard the whole thing, or only the relevant entry. The filesystem will remain consistent. If track corruption occurs after the journal is written, it doesn't matter, since at boot the journal will be replayed and all operations will be performed once more. The track which is corrupted could contain data that wasn't written to in months. How would the journal help? I don't understand this question. When the drive is powered off, the track being written to at that point may be corrupted, right? That track may contain sectors that the OS did't change. These sectors would not be mentioned in the journal. How would a journaling fs fix the corruption? I suppose this could be avoided by requiring that all writes (and journal entries) somehow correspond to a full track. (Which I suppose they may do already, but I don't think so). The track size is not constant. There are more sectors in the outer cylinder tracks than there are in inner cylinder tracks. I'm not even sure if it is possible to extract the detailed geometry info from the drive. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On 14 Jul, Matthias Buelow wrote: Kevin Oberman wrote: How can I fix it on my system? SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. You do NOT want to do that. Not only will performance drop brutally (example: drop to 1/5th of normal write speed for sequential writes, probably worse for random writes) but it will also significantly reduce the lifetime of your disk. Modern disks are designed to be used with the write-back cache enabled, so don't turn it off. There's not much performance difference with SCSI if write caching is disabled. Typical SCSI drives can handle ~63 outstanding read and write transactions and can sort them into a somewhat optimal order if tagged command queuing is in use. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. No, the problem is that FreeBSD doesn't implement request barriers and that softupdates is flawed by design and seemingly could not make use of them, even if they were available (because, as I understand it, it relies on a total ordering of all writes, unlike the partial ordering necessary for a journalled fs). Softupdates only needs to be partial ordering. It just needs to be notified when the data hits the platter so that it can send any dependent writes to the disk. Wouldn't the use of barriers have the potential to force a lot of unrelated cached write data to be written much earlier than necessary? If so, there would seem to be a performance penalty under certain workloads, though performance would still be better than with write-caching disabled. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On 14 Jul, Kevin Oberman wrote: Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Hello, everybody! I have found unusual and dangerous situation with shutdown process: I did a copy of 200 GB data on the 870 GB partition (softupdates is enabled) by cp command. It took a lot of time when I did umount for this partition exactly after cp, but procedure finished correctly. When you unmounted the file system, that should have flushed all the dirty files to the disk. In case, if I did âshutdown âh(r)â, also exactly after cp, the shutdown procedure waited for âsyncâ (umounting of the file system) but sync process was terminated by timeout, and fsck checked and did correction of the file system after boot. Did the timeout occur during the syncer shutdown, or at the syncing disks ... step. Did you have any ext2 file systems mounted? These should be manually unmounted before shutdown because they confuse the final sync code. System 5.4-stable, RAM 4GB, processor P-IV 3GHz. How can I fix it on my system? SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. That should only make a difference in a power-fail situation, and it only makes a difference if the only unwritten data is in the drive's write cache. I am not sure if write-cache can be turned off on SCSI, but SCSI drives seem less likely to lie about when the data is actually flushed to the drive. Yes it can, and I recommend it. Use the camcontrol modepage command to set the WCE bit to 0. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Kevin Oberman [EMAIL PROTECTED] wrote: [...] I believe that the Windows solution to this problem is to put a really, really long delay between when the system is finished syncing and when the power is turned off. Definitely not. When I compare Windows XP and FreeBSD on the same hardware (notebook with ATA disk), then Windows' shutdown process is a lot faster than FreeBSD's. In fact, when I shut it down under XP for the first time, the power was off so quickly that I thought someting must have gone wrong. But everything was OK and normal. This might be the best solution for FreeBSD, as well, but it will irritate people. It is already irritating that FreeBSD sits there doing nothing for ~ 5 seconds before turning power off. Windows doesn't do that. (Yes I know, there's a sysctl for that, but I suspect that it's not save to modify it in FreeBSD.) Best regards Oliver -- Oliver Fromme, secnetix GmbH Co KG, Marktplatz 29, 85567 Grafing Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. Emacs ist für mich kein Editor. Für mich ist das genau das gleiche, als wenn ich nach einem Fahrrad (für die Sonntagbrötchen) frage und einen pangalaktischen Raumkreuzer mit 10 km Gesamtlänge bekomme. Ich weiß nicht, was ich damit soll. -- Frank Klemm, de.comp.os.unix.discussion ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow [EMAIL PROTECTED] wrote: Sorry folks, have I somehow dropped into a parallel universe, or is there some serious misunderstanding going on? Seems so. To the OP: There is no sync process that is being killed by shutdown Yes, there is a kernel process called syncer. During shutdown, each of the kernel processes (including the syncer) has 60 seconds to terminate. If it doesn't, timed out is printed on the console. This timeout can be changed using a sysctl tunable (kern.shutdown.kproc_shutdown_wait). The kernel writes out all dirty buffers as part of its shutdown procedure. When you shut down a machine, the kernel flushes all dirty buffers to disk. While it is doing that, it displays the number of remaining buffers, with increasing time intervals between them. If there are still buffers left after a certain number of intervals without change, the kernel gives up. If that is really the problem, then the best solution would be to make the number of flushing intervals and/or the increasing interval a sysctl tunable. They're currently hardcoded at 20 and 50ms, respectively; see the boot() function in src/sys/kern/kern_shutdown.c. That means that the timeout will happen after 10 seconds. Doubling the number of intervals (i.e. 40 instead of 20) will make the timeout happen after 40 seconds, which should be sufficient. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co KG, Marktplatz 29, 85567 Grafing Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. C++ is the only current language making COBOL look good. -- Bertrand Meyer ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Oliver Fromme [EMAIL PROTECTED] writes: buffers to disk. While it is doing that, it displays the number of remaining buffers, with increasing time intervals between them. If there are still buffers left after a certain number of intervals without change, the kernel gives up. Why is it doing this? Can't it just enumerate the buffers and write them, one by one? mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Mon, 2005-07-18 at 16:35 +0200, Matthias Buelow wrote: Oliver Fromme [EMAIL PROTECTED] writes: buffers to disk. While it is doing that, it displays the number of remaining buffers, with increasing time intervals between them. If there are still buffers left after a certain number of intervals without change, the kernel gives up. Why is it doing this? Can't it just enumerate the buffers and write them, one by one? Why would that necessarily be more successful? If the outstanding buffers count is not reducing between time intervals, it is most likely because there is some underlying hardware problem (e.g., a bad block). If the count still persists in staying put, it likely means whatever the hardware is doing to try and fix things (e.g., write reallocation) isn't working, and so the kernel may as well give up. You can enumerate the buffers and *try* to write them, but that doesn't guarantee they will be written successfully any more than observing the relative number left outstanding. Cheers, Paul. -- e-mail: [EMAIL PROTECTED] Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid. --- Frank Vincent Zappa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Paul Mather [EMAIL PROTECTED] writes: Why would that necessarily be more successful? If the outstanding buffers count is not reducing between time intervals, it is most likely because there is some underlying hardware problem (e.g., a bad block). If the count still persists in staying put, it likely means whatever the hardware is doing to try and fix things (e.g., write reallocation) isn't working, and so the kernel may as well give up. So the kernel is relying on guesswork whether the buffers are flushed or not... You can enumerate the buffers and *try* to write them, but that doesn't guarantee they will be written successfully any more than observing the relative number left outstanding. That's rather nonsensical. If I write each buffer synchronously (and wait for the disk's response) this is for sure a lot more reliable than observing changes in the number of remaining buffers. I mean, where's the sense in the latter? It would be analogous to, in userspace, having to monitor write(2) continuously over a given time interval and check whether the number it returns eventually reaches zero. That's complete madness, imho. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow [EMAIL PROTECTED] wrote: Oliver Fromme [EMAIL PROTECTED] writes: buffers to disk. While it is doing that, it displays the number of remaining buffers, with increasing time intervals between them. If there are still buffers left after a certain number of intervals without change, the kernel gives up. Why is it doing this? Can't it just enumerate the buffers and write them, one by one? I don't think that the boot() function in kern_shutdown.c can do that. It has got nothing to do with the syncing business itself. It can only trigger the syncing (similar to the sync(8) tool), which basically means performing a vfs_sync with flag MNT_NOWAIT for every mounted filesystem. Then it has to wait for the appropriate kernel process to do its job. See the source. I don't think there's an easy way to change that. If you see such a way, I'd suggest you code it up and use send-pr. Best regards Oliver -- Oliver Fromme, secnetix GmbH Co KG, Marktplatz 29, 85567 Grafing Any opinions expressed in this message may be personal to the author and may not necessarily reflect the opinions of secnetix in any way. Python tricks is a tough one, cuz the language is so clean. E.g., C makes an art of confusing pointers with arrays and strings, which leads to lotsa neat pointer tricks; APL mistakes everything for an array, leading to neat one-liners; and Perl confuses everything period, making each line a joyous adventure wink. -- Tim Peters ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow [EMAIL PROTECTED] writes: Lowell Gilbert wrote: Well, break it down a little bit. If an ATA drive properly implements the cache flush command, then none of the ongoing discussion is Are you sure this is the case? Are there sequence points in softupdates where it issues a flush request and by this guarantees fs integrity? No, you're right. I meant write completions, not cache flushes. I don't know of any drives that do one properly and not the other, but they're certainly not the same thing. I've read thru McKusick's paper in search for an answer but haven't found any. All I've read so far on mailing lists and from googling was that softupdates doesn't work if the wb-cache is enabled. On a lot of ATA drives that don't implement the spec properly. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Mon, 2005-07-18 at 17:14 +0200, Matthias Buelow wrote: Paul Mather [EMAIL PROTECTED] writes: Why would that necessarily be more successful? If the outstanding buffers count is not reducing between time intervals, it is most likely because there is some underlying hardware problem (e.g., a bad block). If the count still persists in staying put, it likely means whatever the hardware is doing to try and fix things (e.g., write reallocation) isn't working, and so the kernel may as well give up. So the kernel is relying on guesswork whether the buffers are flushed or not... I don't know if you are just deliberately trying to be contentious, but that is a serious misrepresentation of what is happening. Quite obviously the kernel knows whether a buffer has successfully been flushed, otherwise a count of outstanding buffers would be meaningless. (Surely you're not saying the kernel simply guesses if a buffer has been flushed in maintaining its count of outstanding buffers? What would be the point of that?) If you calm down and think about it for a little, you'll realise what you suggest to do and what is actually done amount to the same thing in practical terms. It's all very easy to say to write each buffer synchronously (and wait for the disk's response), but what do you do when the buffer *does* get stuck and won't complete (e.g., because someone removed the floppy or USB disk, or your remote ggate server disappeared, or your hard disk is going bad, etc.)? Do you just bail immediately at that point? Or do you keep retrying in the hope it will complete? In the end, it comes down to waiting a certain amount of time for drivers to do their best and then giving up. The only real question is how long you wait, and maybe whether syncer is not waiting long enough (and hence how to extend the amount of time it is willing to wait until it gives up on buffers being unflushable). I'm not sure why that is fundamentally madness. Cheers, Paul. -- e-mail: [EMAIL PROTECTED] Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid. --- Frank Vincent Zappa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Oliver Fromme wrote: Definitely not. When I compare Windows XP and FreeBSD on the same hardware (notebook with ATA disk), then Windows' shutdown process is a lot faster than FreeBSD's. In fact, when I shut it down under XP for the first time, the power was off so quickly that I thought someting must have gone wrong. But everything was OK and normal. Yes XP's shutdown time is quicker on a fresh install, but give it a few weeks or months (depending on how you use XP) and you will notice that the shutdown time increases. Right now my XP pro takes about 20 or more seconds to shutdown. I am sure this is due to the MFT under a NTFS installation, FreeBSD does not appear to have this problem over extended use, and there is no way of stopping the MTF growth problem (as far as I know) It is already irritating that FreeBSD sits there doing nothing for ~ 5 seconds before turning power off. Windows doesn't do that. (Yes I know, there's a sysctl for that, but I suspect that it's not save to modify it in FreeBSD.) Best regards Oliver -- Kind regards, Jayton Garnett email: [EMAIL PROTECTED] Main : www.uberhacker.co.uk Test server: jayton.plus.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Comment from out in left field -- On 2005/07/16, at 6:03, Kevin Oberman wrote: [...] I believe that the Windows solution to this problem is to put a really, really long delay between when the system is finished syncing and when the power is turned off. That's what I vote for. If the system has ATA on it, send a line to the console that says waiting for ATA technology drives to quit lying after the final sync, and then wait 30 seconds to cut power. This might be the best solution for FreeBSD, as well, but it will irritate people. My impression is that in this case irritation is recommended. (I'm half wondering if Microsoft and the drive manufacturer's haven't defined some hidden API for forcing the drive electronics to be truthful.) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Friday, 15. July 2005 21:14, Matthias Buelow wrote: Why am I arguing in an uphill battle here? important to the FreeBSD community? Such issues should not even have to be discussed at all! I completely agree, and there's really no point in arguing with people who are happy to throw dollars at the problem rather than fixing it either. At this point, code is needed. -- ,_, | Michael Nottebrock | [EMAIL PROTECTED] (/^ ^\) | FreeBSD - The Power to Serve | http://www.freebsd.org \u/ | K Desktop Environment on FreeBSD | http://freebsd.kde.org pgpF1Rh0vZeFA.pgp Description: PGP signature
Re: dangerous situation with shutdown process
* Matthias Buelow [EMAIL PROTECTED] [2005-07-16 01:42 +0200]: David Taylor [EMAIL PROTECTED] writes: A corrupted journal can be detected. If it's corrupted, discard the whole thing, or only the relevant entry. The filesystem will remain consistent. If track corruption occurs after the journal is written, it doesn't matter, since at boot the journal will be replayed and all operations will be performed once more. The track which is corrupted could contain data that wasn't written to in months. How would the journal help? I don't understand this question. The track destroyed could contain sectors which are in no way related to the sectors the OS is writing to. Nicolas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Sat, 16 Jul 2005, Matthias Buelow wrote: David Taylor [EMAIL PROTECTED] writes: A corrupted journal can be detected. If it's corrupted, discard the whole thing, or only the relevant entry. The filesystem will remain consistent. If track corruption occurs after the journal is written, it doesn't matter, since at boot the journal will be replayed and all operations will be performed once more. The track which is corrupted could contain data that wasn't written to in months. How would the journal help? I don't understand this question. When the drive is powered off, the track being written to at that point may be corrupted, right? That track may contain sectors that the OS did't change. These sectors would not be mentioned in the journal. How would a journaling fs fix the corruption? I suppose this could be avoided by requiring that all writes (and journal entries) somehow correspond to a full track. (Which I suppose they may do already, but I don't think so). I still don't trust ATA drives. Can you guarantee (or show any reason to believe) that disabling the write cache will actually wait for the cache to be flushed before returning? Otherwise a disable cacheenable cache sequence is exactly the same as a flush cache command. If the drive executes both immediately, without waiting for the cache to be flushed _before_ returning, what's the difference? You imply that, because there exists one drive for which it doesn't work, that it follows that it won't work for all drives? Or what is your point? No. I'm just asking if you know of ANY ata drives that will wait for the cache to be flushed before claiming the disable cache command has succeeded. I don't, but I haven't looked. -- David Taylor ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow wrote this message on Fri, Jul 15, 2005 at 22:11 +0200: for write barriers in the block drivers had been implemented, we phk removed support for write barriers because no one was making use of them... FreeBSD had them, and when there is *CODE* that makes use of them, they'll be added back... As the saying goes: Shut up and code!!! -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Nicolas Rachinsky wrote: The track which is corrupted could contain data that wasn't written to in months. How would the journal help? I don't understand this question. The track destroyed could contain sectors which are in no way related to the sectors the OS is writing to. And in what way is that related to the existence or nonexistence of write barriers and a journal? If you pound the disk with a hammer, it will most likely break, no matter what strategy you're using. That you cannot eliminate _all_ sources of error with a strategy doesn't mean that you shouldn't implement it to minimize the number of errors that could happen. Besides, I always thought that (most) disks had enough power reserve to be able to write at least one track when power goes away? Or is that an urban myth, I don't know for sure. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Somewhere around Fri, Jul 15, 2005 at 22:13 , the world stopped and listened as [EMAIL PROTECTED] graced us with this profound tidbit of wisdom that would fulfill the enjoyment of future generations: Date: Fri, 15 Jul 2005 18:22:14 +0200 From: Matthias Buelow [EMAIL PROTECTED] Subject: Re: dangerous situation with shutdown process To: Bill Vermillion [EMAIL PROTECTED] Cc: freebsd-stable@freebsd.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=us-ascii Bill Vermillion wrote: Copying very large files and then shutting down I hope is not a normal procecure for you. softupdates sometimes do take a long time when you are removing/copying very large files. Others have suggested different time-outs but you'd have to figure out the largest size you may every encounter and set things for that, which is not going to help for everyday operation. I've watched the amount of disk space increase slowly by performing 'df' and it can take a long time - up to a minute on some extremely large partitions I was cleaning. One way to force everything to be written I've found [by observation only] is to perform an fsck on that file system. If you only do huge copies and immediate shutdowns rarely, then maybe it's just a good idea to remember how softupdates work, and then fsck, then shutdown. I'm always against changing default operations from typical operations to extremes. Sorry folks, have I somehow dropped into a parallel universe, or is there some serious misunderstanding going on? To the OP: There is no sync process that is being killed by shutdown. The kernel writes out all dirty buffers as part of its shutdown procedure. I was under the impression that there was a problem, that's why I wrote my reply. Bill, as I get it from what you wrote, correct me if I'm wrong, you assume that: 1. unmount doesn't wait for all dirty data being committed to disk before somehow removing the filesystem, That's what the OP seemed to indicate. 2. fsck on a live filesystem will somehow speed things up. Actually an fsck on a live filesystem will force the softupdates to complete more quickly - that is from observation - and when I've deleted extremly large directories - usually /usr/src and /usr/obj. It only speeds up flushing the blocks to disk. For 1., this is surely not the case, the same as with shutdown, the kernel of course writes (drive errors notwithstanding) all modified buffers and updates all on-disk structures before marking the fs clean, and for 2., you should never fsck a mounted filesystem. Besides, it is completely unnecessary. You can fsck a mounted file system and fsck will run in read-only mode. That way you can check for problems, and if there is something wrong you can shutdown and restart. FreeBSD will NOT run fsck in anything other than READ ONLY when the file system is mounted And in the old days when drives were smaller and slower and perfomance needed to be maximized, from about Verision III through System V you could run fsck -S device from cron!! The -S flag was interesting in that it would actually re-write the freelist IF AND ONLY IF there was no corruption on the drive. Since blocks on those systems were used in the revers order they were released, running fsck -S sorted the freelist in ascending order and thus helped to elminate fragmentation. This was particularly important on the S51 file systems - as it was before the SysV's adoptedf variants of the FFS system that came from BSD. If the OP has encountered any data corruption, this is due to an unclean shutdown because of disk errors or a kernel bug, and not because of timeouts that are too short or something like that. It would have been nice to see his actual errors. Bill -- Bill Vermillion - bv @ wjv . com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
* Matthias Buelow [EMAIL PROTECTED] [2005-07-16 16:07 +0200]: Nicolas Rachinsky wrote: The track which is corrupted could contain data that wasn't written to in months. How would the journal help? I don't understand this question. The track destroyed could contain sectors which are in no way related to the sectors the OS is writing to. And in what way is that related to the existence or nonexistence of write barriers and a journal? You wrote before: | If track corruption occurs after the journal is written, it doesn't | matter, since at boot the journal will be replayed and all operations | will be performed once more. If you pound the disk with a hammer, it will most likely break, no matter what strategy you're using. That you cannot eliminate _all_ sources of error with a strategy doesn't mean that you shouldn't implement it to minimize the number of errors that could happen. I'm not argumenting for or against write barriesrs or a journal. Nicolas ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
David Taylor wrote: No. I'm just asking if you know of ANY ata drives that will wait for the cache to be flushed before claiming the disable cache command has succeeded. I don't, but I haven't looked. I don't know either. I assume that they do. Does it matter? I mean, I'm not suggesting a frivolous new theory that is highly speculative and warrants a lengthy debate on its purported merits. What I described is common practice on Windows, Linux and probably a few other systems and I would think that they're not doing this for nothing. And, frankly, I'm a bit astonished that the FreeBSD (community) seems to be so ignorant of well-known measures for improving data safety on consumer-grade desktop hardware. Does that mean that FreeBSD is deemed generally unsuited for desktop and laptop use and should be reserved for servers with the appropriate (expensive) hardware? I hope not. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Bill Vermillion wrote: You can fsck a mounted file system and fsck will run in read-only mode. That way you can check for problems, and if there is something wrong you can shutdown and restart. FreeBSD will NOT run fsck in anything other than READ ONLY when the file system is mounted I thought fsck on a live (read-write) filesystem almost always brings up errors (although only of a certain kind, like dangling inodes) unless the fs has been completely quiescent for a while. A quick check seems to confirm this: ** /dev/ad4s3a (NO WRITE) ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts UNREF FILE I=94257 OWNER=mkb MODE=100600 SIZE=2397 MTIME=Jul 16 16:25 2005 CLEAR? no And in the old days when drives were smaller and slower and perfomance needed to be maximized, from about Verision III through System V you could run fsck -S device from cron!! The -S flag was interesting in that it would actually re-write the freelist IF AND ONLY IF there was no corruption on the drive. I'm amazed that this worked.. considering that the fsck would have to be atomic then (i.e., basically halt all filesystem i/o while it's running). mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
At Sat, Jul 16, 2005 at 16:29 , our malformed and occasionally flatulent friend Matthias Buelow spewed forth this fount of brain juice: Bill Vermillion wrote: You can fsck a mounted file system and fsck will run in read-only mode. That way you can check for problems, and if there is something wrong you can shutdown and restart. FreeBSD will NOT run fsck in anything other than READ ONLY when the file system is mounted I thought fsck on a live (read-write) filesystem almost always brings up errors (although only of a certain kind, like dangling inodes) unless the fs has been completely quiescent for a while. A quick check seems to confirm this: ** /dev/ad4s3a (NO WRITE) ** Last Mounted on / ** Root file system ** Phase 1 - Check Blocks and Sizes ** Phase 2 - Check Pathnames ** Phase 3 - Check Connectivity ** Phase 4 - Check Reference Counts UNREF FILE I=94257 OWNER=mkb MODE=100600 SIZE=2397 MTIME=Jul 16 16:25 2005 CLEAR? no The 'no' was supplied by the system, was it not. First line sas NO WRITE. And in the old days when drives were smaller and slower and perfomance needed to be maximized, from about Verision III through System V you could run fsck -S device from cron!! The -S flag was interesting in that it would actually re-write the freelist IF AND ONLY IF there was no corruption on the drive. I'm amazed that this worked.. considering that the fsck would have to be atomic then (i.e., basically halt all filesystem i/o while it's running). We'd run it from cron as noted. And this was done overnight - in systems where users were there only in the daytime. It did make a difference in keeping perfomance up longer than without it. Without that you'd basically have to backup the fs, remake the fs, and then reload to get back the originally installed performance. But as I noted this was for the S51 file system. It was really slow. On my first Sys V.3 system, I made one file system with the old S51/Xenix layout, and everthing else was an FFS that was slightly modified from the the original BSD systems. That was probably about 1990. The performance on the S51 ON THE SAME DRIVE - was no better than 30% as fast as the FSS and most of the time it was only 10% as fast. Once all the old customers moved to newer OS versions the old fsck -S [note that it is capital S and not 's' - and you'll have to find a Sys V manual to document the differentce - and I don't have one handy at the moment]. And small businesses were very very reluctant to upgrade unless they were forced too. I did some Y2K patching on OSes that had been installed in the late 1980s. And about the latest anyone would be using a system there would be about 9PM - when the owner stayed late. With current systems, in particular net connected systems with email, you could not hope to find a quiescent system. However the S flag was rewrite the freelist ONLY if the rest of the fsck gave no errors. If there were problems, such as the unrefferenced file you showed in your example, the freelist would not be re-written. That's why it was OK to run it in cron. Anyone who had not worked with Unix systems of 10-25 years ago can't begin to appreciate how good things are today. Bill -- Bill Vermillion - bv @ wjv . com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Jul 15, 2005, at 11:08, Bill Vermillion wrote: If you only do huge copies and immediate shutdowns rarely, then maybe it's just a good idea to remember how softupdates work, and then fsck, then shutdown. This may sound simplistic, but what about a triple sync(8)? (sync; sync; sync) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
I know you'll find this hard to believe, but on Sat, Jul 16, 2005 at 10:52 , David Magda actually admitted to saying: On Jul 15, 2005, at 11:08, Bill Vermillion wrote: If you only do huge copies and immediate shutdowns rarely, then maybe it's just a good idea to remember how softupdates work, and then fsck, then shutdown. This may sound simplistic, but what about a triple sync(8)? (sync; sync; sync) Actually I saw that documented a very very long time ago in an Intel Unix manual. And Intel got out of Unix in the mid to late 1980s. I don't recall if that was the one that was sold to Kodak - the picture people - which then was sold to Interactive ?? - and eventually wound up at Sun. There were so many Unix variants in those days you had to have a chart to keep up with them. Each HW manufacturer had their own version and name, and at that time the only time you could call your OS Unix was if you compiled it directly from the ATT tapes with no changes on a Vax [if I recall the scenario correctly]. But that was a long time ago. Bill -- Bill Vermillion - bv @ wjv . com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Sat, 2005-07-16 at 16:16 +0200, Matthias Buelow wrote: David Taylor wrote: No. I'm just asking if you know of ANY ata drives that will wait for the cache to be flushed before claiming the disable cache command has succeeded. I don't, but I haven't looked. I don't know either. I assume that they do. Does it matter? I mean, I'm not suggesting a frivolous new theory that is highly speculative and warrants a lengthy debate on its purported merits. What I described is common practice on Windows, Linux and probably a few other systems and I would think that they're not doing this for nothing. And, frankly, I'm a bit astonished that the FreeBSD (community) seems to be so ignorant of well-known measures for improving data safety on consumer-grade desktop hardware. Does that mean that FreeBSD is deemed generally unsuited for desktop and laptop use and should be reserved for servers with the appropriate (expensive) hardware? I hope not. I recall reading on freebsd-current that Scott Long is working on adding journalling support to FFS. Perhaps you'd like to direct your input to him instead of rehashing it repeatedly on here, which is the wrong outlet for such discussion anyway: by definition, CURRENT, not STABLE will get a new feature like journalling, and so discussing the ins and outs of it on freebsd-current would seem more apropos. (Plus, I'd hate to see him implement something only for you to declare it to be the wrong way to do things. Better to get your 2p in now.:) Reagrding your question of does that mean that FreeBSD is deemed generally unsuited for desktop and laptop use, I can speak only from experience. I run 6-CURRENT on an ATA system using softupdates on my desktop, and have been doing so for quite some time. I've seen through quite a few periods of extreme growing pains for CURRENT, with seemingly random panics and mystery crashes at times. (Who can forget the dark days of the ULE + PREEMPTION instability?) Add to the mix that my neighbourhood has pretty flaky power, with a tendency for short interruptions at the first whiff of bad weather. This all adds up to a good smattering of reboots at inopportune times (like when doing buildworlds or large portupgrade sessions:). Despite that, I have never EVER had a problem with data consistency on my file systems. (The only problem I have had is when I added an ATA controller card one time and forgot to disable its RAID BIOS, which promptly spammed over my geom_mirror metadata.:) If softupdates were as unsafe as you often hint, I'm surprised that I haven't lost a file system by now. (I would also expect to hear from the field a lot more clamour about how unsafe it is, and that, in fact, the sky was indeed falling.) I guess I must be amazingly lucky and should start playing the lottery right now. :-) The main inconvenience I have with panics or outages is the fsck times on reboot. (Actually, what I find to be more inconvenient is the resynchronisation time needed for my geom_mirror, which takes a lot longer than a fsck.) I understand that fsck delays for large file systems is the major impetus behind the journalling work, not as a fix for a perceived data consistency problem. Cheers, Paul. PS: I also use softupdates on my NetBSD systems, again without problems. I've also used LFS on NetBSD at times, but have always ended up abandoning it due to performance and severe data reliability problems. (To be clear, though, I'm not sure LFS was deemed to be for production use, at least not the times I tried it.) -- e-mail: [EMAIL PROTECTED] Without music to decorate it, time is just a bunch of boring production deadlines or dates by which bills must be paid. --- Frank Vincent Zappa ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Jul 16, 2005, at 11:03, Bill Vermillion wrote: Actually I saw that documented a very very long time ago in an Intel Unix manual. It's present in more recent documentation. :) The sync utility can be called to ensure that all disk writes have been completed before the processor is halted in a way not suitably done by reboot(8) or halt(8). http://www.freebsd.org/cgi/man.cgi?query=syncsektion=8 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Bill Vermillion said: Actually I saw that documented a very very long time ago in an Intel Unix manual. And Intel got out of Unix in the mid to late 1980s. I don't recall if that was the one that was sold to Kodak - the picture people - which then was sold to Interactive ?? - and eventually wound up at Sun. There were so many Unix variants in those days you had to have a chart to keep up with them. Each HW manufacturer had their own version and name, and at that time the only time you could call your OS Unix was if you compiled it directly from the ATT tapes with no changes on a Vax [if I recall the scenario correctly]. The main reason for sync;sync;sync on V7 UNIX was because you couldn't do a shutdown, only a halt to the hardware monitor, on the PDP11. You can verify that behavior with SIMH. :-) -- Rick Kelly [EMAIL PROTECTED] http://www.rmkhome.com/ http://rkba.rmkhome.com/ - the right to keep and bear arms http://wolf.rmkhome.com/ - firearm forums ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Paul Mather [EMAIL PROTECTED] writes: Despite that, I have never EVER had a problem with data consistency on my file systems. (The only problem I have had is when I added an ATA controller card one time and forgot to disable its RAID BIOS, which promptly spammed over my geom_mirror metadata.:) If softupdates were as unsafe as you often hint, I'm surprised that I haven't lost a file system by now. (I would also expect to hear from the field a lot more clamour about how unsafe it is, and that, in fact, the sky was indeed falling.) I guess I must be amazingly lucky and should start playing the lottery right now. :-) Well, break it down a little bit. If an ATA drive properly implements the cache flush command, then none of the ongoing discussion is relevant. Herr Buelow is worried about drives that do *not* do so, but which, when told to disable their cache, will reliably empty the cache before continuing with further operations. Such drives are the only ones to which any of the discussion applies. If such drives are reasonably common, then using such a hack would make sense. However, I would want some fairly solid evidence on the matter before I was willing to start coding it, and so far the most convincing evidence I have seen is that Microsoft engineers claim to have done it. Be well. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Rick Kelly wrote: The main reason for sync;sync;sync on V7 UNIX was because you couldn't do a shutdown, only a halt to the hardware monitor, on the PDP11. You can verify that behavior with SIMH. :-) Uhm.. that's the same on the VAX.. in what way would that preclude a shutdown? NetBSD certainly shuts down on VAX (and drops into the monitor when it's done.) mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Paul Mather wrote: on reboot. (Actually, what I find to be more inconvenient is the resynchronisation time needed for my geom_mirror, which takes a lot longer than a fsck.) I understand that fsck delays for large file systems is the major impetus behind the journalling work, not as a fix for a perceived data consistency problem. Well... I have lost a few (ca. 3) UFS filesystems due to power loss or a kernel crash in the past but interestingly those were all on SCSI (and in the pre-softupdates era, so mounted with sync metadata updates, where this Shouldn't Happen[tm] either..) I've also seen ext2fs (which doesn't have safeguards against fs corruption) on Linux zapped often by power loss and haven't seen a statistically higher number of corrupted ext2fs than ufs. So the whole thing is a bit hard to quantify. However, I'm all for reducing the possibility of corruption when it could be done, programmatically. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Lowell Gilbert wrote: Well, break it down a little bit. If an ATA drive properly implements the cache flush command, then none of the ongoing discussion is Are you sure this is the case? Are there sequence points in softupdates where it issues a flush request and by this guarantees fs integrity? I've read thru McKusick's paper in search for an answer but haven't found any. All I've read so far on mailing lists and from googling was that softupdates doesn't work if the wb-cache is enabled. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
No, it's at a level below softupdates that this must be done. Softupdates only understands when things have been marked completed with biodone()--the underlying scsi/ata/sata driver must make the determination as to when biodone should be called. The flush has to be done there. _IF_ the flush is being done there, then request barriers represent a performance enhancement, not an integrity enhancement. -Jon On Sat, 16 Jul 2005, Matthias Buelow wrote: Lowell Gilbert wrote: Well, break it down a little bit. If an ATA drive properly implements the cache flush command, then none of the ongoing discussion is Are you sure this is the case? Are there sequence points in softupdates where it issues a flush request and by this guarantees fs integrity? I've read thru McKusick's paper in search for an answer but haven't found any. All I've read so far on mailing lists and from googling was that softupdates doesn't work if the wb-cache is enabled. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Sat, 16 Jul 2005, Rick Kelly wrote: The main reason for sync;sync;sync on V7 UNIX was because you couldn't do a shutdown, only a halt to the hardware monitor, on the PDP11. You can verify that behavior with SIMH. :-) And you weren't supposed to use sync;sync;sync but this: # sync # sync # sync They were *not* the same. -- Dave ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
reducing shutdown time (was: Re: dangerous situation with shutdown process)
Am Freitag, den 15.07.2005, 11:15 +0600 schrieb Sergey N. Voronkov: On Thu, Jul 14, 2005 at 04:17:06PM -0400, asym wrote: At 15:19 7/14/2005, Wilko Bulte wrote: On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote.. Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] [...] If you can't increase shutdown timeout, decrease softupdates timers. # tail -3 /etc/sysctl.conf kern.metadelay=14 kern.dirdelay=15 kern.filedelay=17 That was my solution for shutdown wait timeout. Intersting, I didn't know these knobs. Would it be okay to set them to zero on an embedded system with r/o file systems? And are there other variables tunable for reducing shutdown time (on 4-STABLE)? TIA, Marc ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Thu, Jul 14, 2005 at 22:31 , [EMAIL PROTECTED] moved his mouse, rebooted for the change to take effect, and then said: Message: 13 Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Subject: dangerous situation with shutdown process To: freebsd-stable@freebsd.org, freebsd-questions@freebsd.org Message-ID: [EMAIL PROTECTED] Content-Type: text/plain; charset=UTF-8; format=flowed Hello, everybody! I have found unusual and dangerous situation with shutdown process: I did a copy of 200 GB data on the 870 GB partition (softupdates is enabled) by cp command. It took a lot of time when I did umount for this partition exactly after cp, but procedure finished correctly. In case, if I did ???shutdown ???h(r)???, also exactly after cp, the shutdown procedure waited for ???sync??? (umounting of the file system) but sync process was terminated by timeout, and fsck checked and did correction of the file system after boot. System 5.4-stable, RAM 4GB, processor P-IV 3GHz. How can I fix it on my system? Copying very large files and then shutting down I hope is not a normal procecure for you. softupdates sometimes do take a long time when you are removing/copying very large files. Others have suggested different time-outs but you'd have to figure out the largest size you may every encounter and set things for that, which is not going to help for everyday operation. I've watched the amount of disk space increase slowly by performing 'df' and it can take a long time - up to a minute on some extremely large partitions I was cleaning. One way to force everything to be written I've found [by observation only] is to perform an fsck on that file system. If you only do huge copies and immediate shutdowns rarely, then maybe it's just a good idea to remember how softupdates work, and then fsck, then shutdown. I'm always against changing default operations from typical operations to extremes. Bill -- Bill Vermillion - bv @ wjv . com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow wrote this message on Thu, Jul 14, 2005 at 21:52 +0200: The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. No, the problem is that FreeBSD doesn't implement request barriers and that softupdates is flawed by design and seemingly could not make use of them, even if they were available (because, as I understand it, it relies on a total ordering of all writes, unlike the partial ordering necessary for a journalled fs). even request barries will not save the fs in a power loss if the track that is getting flushed durning a power loss... Some other FreeBSD folk has a reproducable case of where blocks that were not written to on ATA hardware got trashed after a power loss... With non-written to sectors getting trashed with the cache enabled, barriers don't mean squat... -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
John-Mark Gurney wrote: With non-written to sectors getting trashed with the cache enabled, barriers don't mean squat... Of course if you pound the disk with a hammer, then barriers also won't help. Just because with a few disks perhaps it won't work at all doesn't mean that one shouldn't at least try and get it working for perhaps the 90% where it would work in order to reduce the possibility of corruption by as much as possible. I mean, anything is better than the current situation where apparently nothing is done at all. Why am I arguing in an uphill battle here? Is data safety no longer important to the FreeBSD community? Such issues should not even have to be discussed at all! mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
John-Mark Gurney wrote: even request barries will not save the fs in a power loss if the track that is getting flushed durning a power loss... Some other FreeBSD folk has a reproducable case of where blocks that were not written to on ATA hardware got trashed after a power loss... With non-written to sectors getting trashed with the cache enabled, barriers don't mean squat... One more thought.. they _do_ protect against power loss during writing a track -- when used in combination with a journalled fs. A corrupted journal can be detected. If it's corrupted, discard the whole thing, or only the relevant entry. The filesystem will remain consistent. If track corruption occurs after the journal is written, it doesn't matter, since at boot the journal will be replayed and all operations will be performed once more. The combination barriers+journal really seems to be very resilient to filesystem corruption. When it's implemented without errors, and the hardware doesn't do things like change bits randomly, I can't think of a way this scheme can be corrupted at all. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow wrote: John-Mark Gurney wrote: [ ... ] Why am I arguing in an uphill battle here? Is data safety no longer important to the FreeBSD community? Such issues should not even have to be discussed at all! You ask a good question: so, just why are *you* arguing? [1] If sysctl hw.ata.wc=0 doesn't do what you want, please submit a PR containing something better. Or buy SCSI hardware and a real, battery-backed up RAID system, or fibre-channel, or Firewire, or whatever floats your boat. -- -Chuck [1]: After all, generally it takes at least two people to argue, although some people manage to argue even with themselves. :-) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Chuck Swiger wrote: If sysctl hw.ata.wc=0 doesn't do what you want, please submit a PR containing something better. Or buy SCSI hardware and a real, Well, if I had the time, I would. Also if instead of softupdates, a proper journalled filesystem implementation with kernel support for write barriers in the block drivers had been implemented, we wouldn't have this problem now. Ok, no point in arguing how things would be if one had made different decisions in the past. battery-backed up RAID system, or fibre-channel, or Firewire, or whatever floats your boat. I would think that a significant part of (FreeBSD-) users are running FreeBSD on desktop PCs, notebooks, etc., where a fibre-channel or SCSI solution isn't really feasible (either technically, or economically). mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Fri, Jul 15, 2005 at 10:11:02PM +0200, Matthias Buelow wrote.. Chuck Swiger wrote: If sysctl hw.ata.wc=0 doesn't do what you want, please submit a PR containing something better. Or buy SCSI hardware and a real, Well, if I had the time, I would. Also if instead of softupdates, a proper journalled filesystem implementation with kernel support for write barriers in the block drivers had been implemented, we wouldn't have this problem now. Ok, no point in arguing how things sigh Not If The Bloody PeeCee Style Crap ATA Drives Keep Lying To You.. Followups to /dev/null -- Wilko Bulte [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Matthias Buelow [EMAIL PROTECTED] writes: The combination barriers+journal really seems to be very resilient to filesystem corruption. When it's implemented without errors, and the hardware doesn't do things like change bits randomly, I can't think of a way this scheme can be corrupted at all. We keep trying to point out that barriers *can't* be enforced on the hardware with many (most, and apparently an increasing percentage of) ATA drives. There is no semantic on these drives that allows you to guarantee the journal block will be written before the corresponding data block. If you are sure that your drives do this properly, then you are safe, but in that case there's no reliability problem with softupdates, either. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Wilko Bulte wrote: sigh Not If The Bloody PeeCee Style Crap ATA Drives Keep Lying To You.. Followups to /dev/null Yes, makes no sense talking to a wall. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Date: Fri, 15 Jul 2005 22:24:07 +0200 From: Matthias Buelow [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Wilko Bulte wrote: sigh Not If The Bloody PeeCee Style Crap ATA Drives Keep Lying To You.. Followups to /dev/null Yes, makes no sense talking to a wall. You are right, but I don't think you get who the wall is... When you try to get an ATA drive to flush its buffers and tell you when they are flushed, there is a hight probability that the drive (if it support the function at all) will tell you that it has flushed the cache immediately. There is simply no way to tell if your data or metadata is actually on the magnetic medium and no technique (journaling, barriers, soft updates) can assure that you will not have a corrupt disk, especially if the write cache is near full. Think about how long it takes to flush a 16 MB buffer to the hard drive and remember that the dump of the cache to the drive is in an order over which you have no control. The ONLY way to be really safe is to turn off the write cache and that extracts a huge performance penalty. What you prefer is a matter of personal choice but the file system simply can't make things better. I believe that the Windows solution to this problem is to put a really, really long delay between when the system is finished syncing and when the power is turned off. This might be the best solution for FreeBSD, as well, but it will irritate people. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Fri, 15 Jul 2005, Matthias Buelow wrote: Why am I arguing in an uphill battle here? Is data safety no longer important to the FreeBSD community? Such issues should not even have to be discussed at all! I'm trying to tell you what you have to say to move forward on this issue: 1) tell people that they are mistaken about drives ignoring the FUA bit or flush cache 2) convince people that the performance benefit of request barriers is worth it I think we all care, but when we actually care--when money depends on it-- we adopt other measures scsi, batter backed raid, etc. -Jon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Kevin Oberman [EMAIL PROTECTED] writes: I believe that the Windows solution to this problem is to put a really, really long delay between when the system is finished syncing and when the power is turned off. This might be the best solution for FreeBSD, as well, but it will irritate people. The Windows solution is, apparently, to disable and immediately re-enable the writeback-cache around a barrier. This will ensure the cache being written out to the platters, even if the drive ignores a flush command. Of course I don't know this for certain but have to rely on observations that others have made. See, for example: http://mail-index.netbsd.org/tech-kern/2002/12/09/0052.html The long delay at shutdown would simply be a final safeguard in case the drive also ignores disabling of the WC. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Lowell Gilbert [EMAIL PROTECTED] writes: We keep trying to point out that barriers *can't* be enforced on the hardware with many (most, and apparently an increasing percentage of) ATA drives. There is no semantic on these drives that allows you to guarantee the journal block will be written before the corresponding data block. If you are sure that your drives do this properly, then you are safe, but in that case there's no reliability problem with softupdates, either. See my other mail(s) about other systems using cache disabling/enabling to make up for a drive that ignores (or does not implement) a flush command. Then the advice of disable the wb-cache on disks to ensure data safety doesn't make sense: Either * the drive supports disabling the write-back-cache, then this method can be used to flush data to the platters, or else * the drive does not support disabling the write-back-cache, or lies about it, then the advice to disable the write-back-cache for softupdates is meaningless. I know my drive allows disabling of the write cache, as, apparently, the majority of IDE/SATA drives do. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
I know my drive allows disabling of the write cache, as, apparently, the majority of IDE/SATA drives do. Yes fair enough. This command is in the specification as far back as ata-1. I guess it yields reasonable? performance? You should, however, be telling sos@ this--if he doesn't already believe it. -Jon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Fri, 15 Jul 2005, Matthias Buelow wrote: John-Mark Gurney wrote: even request barries will not save the fs in a power loss if the track that is getting flushed durning a power loss... Some other FreeBSD folk has a reproducable case of where blocks that were not written to on ATA hardware got trashed after a power loss... With non-written to sectors getting trashed with the cache enabled, barriers don't mean squat... One more thought.. they _do_ protect against power loss during writing a track -- when used in combination with a journalled fs. A corrupted journal can be detected. If it's corrupted, discard the whole thing, or only the relevant entry. The filesystem will remain consistent. If track corruption occurs after the journal is written, it doesn't matter, since at boot the journal will be replayed and all operations will be performed once more. The track which is corrupted could contain data that wasn't written to in months. How would the journal help? The combination barriers+journal really seems to be very resilient to filesystem corruption. When it's implemented without errors, and the hardware doesn't do things like change bits randomly, I can't think of a way this scheme can be corrupted at all. I still don't trust ATA drives. Can you guarantee (or show any reason to believe) that disabling the write cache will actually wait for the cache to be flushed before returning? Otherwise a disable cacheenable cache sequence is exactly the same as a flush cache command. If the drive executes both immediately, without waiting for the cache to be flushed _before_ returning, what's the difference? -- David Taylor ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
David Taylor [EMAIL PROTECTED] writes: A corrupted journal can be detected. If it's corrupted, discard the whole thing, or only the relevant entry. The filesystem will remain consistent. If track corruption occurs after the journal is written, it doesn't matter, since at boot the journal will be replayed and all operations will be performed once more. The track which is corrupted could contain data that wasn't written to in months. How would the journal help? I don't understand this question. I still don't trust ATA drives. Can you guarantee (or show any reason to believe) that disabling the write cache will actually wait for the cache to be flushed before returning? Otherwise a disable cacheenable cache sequence is exactly the same as a flush cache command. If the drive executes both immediately, without waiting for the cache to be flushed _before_ returning, what's the difference? You imply that, because there exists one drive for which it doesn't work, that it follows that it won't work for all drives? Or what is your point? mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Hello, everybody! I have found unusual and dangerous situation with shutdown process: I did a copy of 200 GB data on the 870 GB partition (softupdates is enabled) by cp command. It took a lot of time when I did umount for this partition exactly after cp, but procedure finished correctly. In case, if I did âshutdown âh(r)â, also exactly after cp, the shutdown procedure waited for âsyncâ (umounting of the file system) but sync process was terminated by timeout, and fsck checked and did correction of the file system after boot. System 5.4-stable, RAM 4GB, processor P-IV 3GHz. How can I fix it on my system? SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. I am not sure if write-cache can be turned off on SCSI, but SCSI drives seem less likely to lie about when the data is actually flushed to the drive. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: [EMAIL PROTECTED] Phone: +1 510 486-8634 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote.. Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Hello, everybody! I have found unusual and dangerous situation with shutdown process: I did a copy of 200 GB data on the 870 GB partition (softupdates is enabled) by cp command. It took a lot of time when I did umount for this partition exactly after cp, but procedure finished correctly. In case, if I did âshutdown âh(r)â, also exactly after cp, the shutdown procedure waited for âsyncâ (umounting of the file system) but sync process was terminated by timeout, and fsck checked and did correction of the file system after boot. System 5.4-stable, RAM 4GB, processor P-IV 3GHz. How can I fix it on my system? SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. I am not sure if write-cache can be turned off on SCSI, but SCSI drives seem less likely to lie about when the data is actually flushed to the drive. At least you can set FUA if you want to force the data onto the platter. -- Wilko Bulte [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Kevin Oberman wrote: SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. I am not sure if write-cache can be turned off on SCSI, but SCSI drives seem less likely to lie about when the data is actually flushed to the drive. SCSI, Adaptec 2110S -- Anatoliy Dmytriyev [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Kevin Oberman wrote: How can I fix it on my system? SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. You do NOT want to do that. Not only will performance drop brutally (example: drop to 1/5th of normal write speed for sequential writes, probably worse for random writes) but it will also significantly reduce the lifetime of your disk. Modern disks are designed to be used with the write-back cache enabled, so don't turn it off. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. No, the problem is that FreeBSD doesn't implement request barriers and that softupdates is flawed by design and seemingly could not make use of them, even if they were available (because, as I understand it, it relies on a total ordering of all writes, unlike the partial ordering necessary for a journalled fs). Until a journalled fs that uses write request barriers is available for FreeBSD, you better had a reliable UPS. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Thu, Jul 14, 2005 at 09:52:53PM +0200, Matthias Buelow wrote: Kevin Oberman wrote: The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. No, the problem is that FreeBSD doesn't implement request barriers and that softupdates is flawed by design and seemingly could not make use of them, even if they were available (because, as I understand it, it relies on a total ordering of all writes, unlike the partial ordering necessary for a journalled fs). Until a journalled fs that uses write request barriers is available for FreeBSD, you better had a reliable UPS. How do OS-level request barriers help if the disk reorders pending writes in its cache? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
softupdates is perfectly safe with SCSI. its well known that ide and sata w/wo ncq fails to provide suitable semantics for softupdates however, journaling fairs no better, and request barriers do nothing to solve the problem. Request Barriers under linux exist to prevent the low level kernel block device layer from reordering write operations from the upper file system layers. Request Barriers consist of nothing more than tagging internal queues within the Linux kernel itself. They do nothing to resolve the underlying failures of the hardware to provide proper semantics to the block device layer. but, Request Barriers are ultimately useless. They can't resolve the underlying problems with ide/sata and there are already exposed semantics for scsi. if you absolutely must use sata and have reliable writes, make use of sata with battery-backed raid controller. On Thu, 14 Jul 2005, Matthias Buelow wrote: Kevin Oberman wrote: How can I fix it on my system? SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. You do NOT want to do that. Not only will performance drop brutally (example: drop to 1/5th of normal write speed for sequential writes, probably worse for random writes) but it will also significantly reduce the lifetime of your disk. Modern disks are designed to be used with the write-back cache enabled, so don't turn it off. The problem is that disks lie about whether they have actually written data. If the power goes off before the data is in cache, it's lost. No, the problem is that FreeBSD doesn't implement request barriers and that softupdates is flawed by design and seemingly could not make use of them, even if they were available (because, as I understand it, it relies on a total ordering of all writes, unlike the partial ordering necessary for a journalled fs). Until a journalled fs that uses write request barriers is available for FreeBSD, you better had a reliable UPS. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
At 15:19 7/14/2005, Wilko Bulte wrote: On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote.. Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Hello, everybody! I have found unusual and dangerous situation with shutdown process: I did a copy of 200 GB data on the 870 GB partition (softupdates is enabled) by cp command. It took a lot of time when I did umount for this partition exactly after cp, but procedure finished correctly. In case, if I did âshutdown âh(r)â, also exactly after cp, the shutdown procedure waited for âsyncâ (umounting of the file system) but sync process was terminated by timeout, and fsck checked and did correction of the file system after boot. System 5.4-stable, RAM 4GB, processor P-IV 3GHz. How can I fix it on my system? The funny thing about all the replies here.. is that this guy is not saying that sync doesn't work. He's saying that the timeout built into shutdown causes it to *terminate* the sync forcibly before it's done, and then reboot. All finger pointing about IDE, SCSI, softupdates, and journals aside.. I think all he wants/needs is a way to increase that timer. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
David Sze wrote: Until a journalled fs that uses write request barriers is available for FreeBSD, you better had a reliable UPS. How do OS-level request barriers help if the disk reorders pending writes in its cache? By separating journal updates from the corresponding metadata (and/or data) actions, and by guaranteeing (by flushing the cache, or a singular disabling/enabling of the wb cache at the barrier) that the journal is updated on disk before the actions take place. This imposes an ordering on the journal vs. action requests, which is what a journalled fs needs for filesystem integrity. It doesn't really matter if the disk reorders writes within those two blocks, the only thing that really matters is that the journal update is completed before metadata (or data) updates take place. With softupdates, as far as I understand, that doesn't work, because there is no journal. All requests must be in the order that softupdates decrees. You'd have to issue a barrier request after every write request, which would be equivalent to disabling the wb cache. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Jon Dama wrote: Request Barriers under linux exist to prevent the low level kernel block device layer from reordering write operations from the upper file system layers. Request Barriers consist of nothing more than tagging internal queues within the Linux kernel itself. They do nothing to resolve the underlying failures of the hardware to provide proper semantics to the block device layer. but, Request Barriers are ultimately useless. They can't resolve the underlying problems with ide/sata and there are already exposed semantics for scsi. If you flush the cache at barriers, on-disk integrity of the journal vs. metadata updates is guaranteed. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
if the FUA bit in the sata command header is properly respected. if the flush cache command on an ata device is properly respected. if the flush cache command on an ata device is implemented (it's optional) if the flush cache command exists when the ata device was made (it isn't in the earlier versions of the ata spec). anyways, your comments about softupdates needing total ordering versus journals needing partial ordering are wrong. softupdates only requires that you do not call 'biodone(x)' until 'x' has been committed to disk. this is 100% compatiable with the specification feature set, IF those semantics are actually present in the hardware. please see the thread beginning with the following commit message for an extensive discussion of these topics: http://lists.freebsd.org/pipermail/cvs-src/2003-April/001002.html -Jon On Thu, 14 Jul 2005, Matthias Buelow wrote: Jon Dama wrote: Request Barriers under linux exist to prevent the low level kernel block device layer from reordering write operations from the upper file system layers. Request Barriers consist of nothing more than tagging internal queues within the Linux kernel itself. They do nothing to resolve the underlying failures of the hardware to provide proper semantics to the block device layer. but, Request Barriers are ultimately useless. They can't resolve the underlying problems with ide/sata and there are already exposed semantics for scsi. If you flush the cache at barriers, on-disk integrity of the journal vs. metadata updates is guaranteed. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Jon Dama [EMAIL PROTECTED] writes: if the FUA bit in the sata command header is properly respected. if the flush cache command on an ata device is properly respected. if the flush cache command on an ata device is implemented (it's optional) if the flush cache command exists when the ata device was made (it isn't in the earlier versions of the ata spec). or if the write-back cache can be disabled and re-enabled. anyways, your comments about softupdates needing total ordering versus journals needing partial ordering are wrong. softupdates only requires that you do not call 'biodone(x)' until 'x' has been committed to disk. Well. Can it group writes in such a way that flushing would be required only at larger intervals, or can't it? this is 100% compatiable with the specification feature set, IF those semantics are actually present in the hardware. Apparently it is not compatible with the real-world feature set and it should've been clear to the designer(s) of softupdates that write-back caches signal completion while the data is still in the cache. That's the whole purpose of these mechanisms (so they can delay and reorder the writes and write out whole tracks). You should only assume that, in that case, a seperate flush command (or a workaround that amounts to a flush) exists. Any different design assumes an oversimplified black box notion of a drive that does not correspond with reality. please see the thread beginning with the following commit message for an extensive discussion of these topics: http://lists.freebsd.org/pipermail/cvs-src/2003-April/001002.html I've seen nothing that contradicts what I've said. The point is, that the request barrier design with flushing at barriers as used in M$ Windows (and also completed in recent Linux kernels) allows safe use of disks with write-back cache enabled, while FreeBSD with softupdates apparently doesn't. I don't really care how it's implemented, or if journalling is used, or softupdates, or a quantum-tachyon-reverser mounted on the front antenna. I just want to have the same level of data safety on my hardware with FreeBSD that I would get with other systems. mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Jon Dama [EMAIL PROTECTED] writes: softupdates is perfectly safe with SCSI. its well known that ide and sata w/wo ncq fails to provide suitable semantics for softupdates however, journaling fairs no better, and request barriers do nothing to solve the problem. I had assumed that the sequence of operations in a journal would be idempotent. Is that a reasonable design criterion? [If it is, then it would make up for the fact that you can't build a reliable transaction gate. That is, you would just have to go back far enough that you *know* all of the needed journal is within the range you will replay. But even then, the journal would need to be on a separate medium, one that doesn't have the lying to you about transaction completion problem.] On Thu, 14 Jul 2005, Matthias Buelow wrote: Kevin Oberman wrote: SCSI or ATA? If it's ATA, turn off write cache with (atacontrol(8) or the sysctl. You do NOT want to do that. Not only will performance drop brutally (example: drop to 1/5th of normal write speed for sequential writes, probably worse for random writes) but it will also significantly reduce the lifetime of your disk. Modern disks are designed to be used with the write-back cache enabled, so don't turn it off. I have no idea what designed to be used with the write-back cache enabled could affect the operating life of the disk. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
Lowell Gilbert [EMAIL PROTECTED] writes: Jon Dama [EMAIL PROTECTED] writes: however, journaling fairs no better, and request barriers do nothing to solve the problem. I had assumed that the sequence of operations in a journal would be idempotent. Is that a reasonable design criterion? [If it is, then it would make up for the fact that you can't build a reliable transaction gate. That is, you would just have to go back far enough that you *know* all of the needed journal is within the range you will replay. But even then, the journal would need to be on a separate medium, one that doesn't have the lying to you about transaction completion problem.] No, it needn't. It is sufficient that the journal entries for a block of updates that are to follow are on disk before the updates are made. That's all. This can be achieved by inserting a write barrier request in between the journal writes and the actual data/metadata writes. The block driver will, when it sees the barrier, a) write out all requests in its queue that it got before the barrier, and b) flush the cache so that they will not get intermixed by the drive with the following data writes. What could happen now when the power goes away at an inopportune moment? [Note that I'm only talking about filesystem integrity, not general data loss.] * If power goes away before the journal is written, nothing happens. * If the journal is partially written, and power goes away, it will be partially replayed at boot but the filesystem will be consistent. * If power goes away, when the journal is fully written, but no metadata updates have been performed, they will be performed at boot and everything is as if the full request has completed before power went out. * If power goes away when the journal is fully written, and parts of the metadata updates have been written, those updates will be performed twice (once more at reboot) but that won't matter since these operations are idempotent. The remaining metadata updates are then performed once, at reboot. So where is the need for the journal to be on a seperate medium? The only thing that matters is that no metadata updates will be written before the journal has been written, and flushing the disk cache at a barrier will ensure this. Note that the disk doesn't even have to flush the cache when it receives that command, it only has to ensure that it'll perform all requests before the flush in front of those that come afterwards. I have no idea what designed to be used with the write-back cache enabled could affect the operating life of the disk. If you disable the write cache, you get a much higher weartear due to much more seeking. If I observe a 5x performance degradation when the cache is disabled, for sequential writes (i.e., no cache overwriting effects), I would think that I also have a factor 1 of increased seeking operations in the drive, otherwise the performance degradation cannot be explained. [Besides, the disk gets really loud when the cache is disabled.] mkb. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: dangerous situation with shutdown process
On Thu, Jul 14, 2005 at 04:17:06PM -0400, asym wrote: At 15:19 7/14/2005, Wilko Bulte wrote: On Thu, Jul 14, 2005 at 12:14:49PM -0700, Kevin Oberman wrote.. Date: Thu, 14 Jul 2005 20:38:15 +0200 From: Anatoliy Dmytriyev [EMAIL PROTECTED] Sender: [EMAIL PROTECTED] Hello, everybody! I have found unusual and dangerous situation with shutdown process: I did a copy of 200 GB data on the 870 GB partition (softupdates is enabled) by cp command. It took a lot of time when I did umount for this partition exactly after cp, but procedure finished correctly. In case, if I did ???shutdown ???h(r)???, also exactly after cp, the shutdown procedure waited for ???sync??? (umounting of the file system) but sync process was terminated by timeout, and fsck checked and did correction of the file system after boot. System 5.4-stable, RAM 4GB, processor P-IV 3GHz. How can I fix it on my system? The funny thing about all the replies here.. is that this guy is not saying that sync doesn't work. He's saying that the timeout built into shutdown causes it to *terminate* the sync forcibly before it's done, and then reboot. All finger pointing about IDE, SCSI, softupdates, and journals aside.. I think all he wants/needs is a way to increase that timer. If you can't increase shutdown timeout, decrease softupdates timers. # tail -3 /etc/sysctl.conf kern.metadelay=14 kern.dirdelay=15 kern.filedelay=17 That was my solution for shutdown wait timeout. Serg N. Voronkov, Sibitex JSC, Tyumen, Russia. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]