John Ekins wrote:
Hello,

I've a couple of questions about soft updates. I've Googled heavily on this but
not really found a satisfactory answer. The story:

I'm running on numerous FreeBSD 4.7 SMP machines as primary MX machines. The mail
is not stored on the FreeBSD machines but on NetApps via NFS. However the mail is
temporarily spooled on the FreeBSD machines during normal MTA handling and passing
to an anti-virus scanner. I have one large partition /var on each machine where
basically all the work and temporary/transient files for the MTA and AV scanner
takes place.

These machines are heavily utilised, running quite "hot" with a load average of
anything from 2 to 8. Many thousands of temporary files are thus created and
deleted a minute. I have no problem with this as nearly all email is delivered in
under 1 minute whatever.


I notice that after a while the amount of free space as shown by df considerably
varies from a du on /var. I'm aware of why this happens with soft updates, but
that's not the whole story. If I turn off incoming email on a machine, the space
does not seem to sync back to what it should be.  No matter how long I turn off
the MTA, the space is simply not returned, and df/du show differences of about
5:1. Nothing else is writing/holding open files on that partition (even turned
off syslog, cron, etc. and checked using lsof). In comparison, if, for example, on
my normal desktop machine I create a 500MB file, then delete it, the space shortly
afterwards is returned to me when I run df. The only way I've been able to recover
this space to what it should be is to reboot the machine.

I don't know what's wrong, but does unmounting and remounting the partition reclaim the lost space?

As an example, here is a snippet from the console from when I rebooted an affected
machine:

  boot() called on cpu#2
  Waiting (max 60 seconds) for system process `vnlru' to stop...stopped
  Waiting (max 60 seconds) for system process `bufdaemon' to stop...stopped
  Waiting (max 60 seconds) for system process `syncer' to stop...timed out

syncing disks... 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 22 giving up on 22 buffers
Uptime: 27d23h1m27s
Rebooting...


As you can see the file system is unable to sync. When the machine reboots it
literally takes hours to fsck the /var partition (only about 15GB). And the fsck
output is full of messages like this:

UNEXPECTED SOFT UPDATE INCONSISTENCY

Well, this sure isn't good.


Now, is there a problem here with soft updates "losing track" of what is going on
on this busy partition? It would appear to be so as quietening the machine does
not lead to a proper sync. Secondly, why does the fsck take such an inordinate
amount of time for a smallish partition?

If there's a LOT of inodes with problems, it could easily take a while to fix. Also, if you run fsck without specifying a filesystem to fix, it exhaustively checks all filesystems. So even if the problem is on /var, it might spend a long time checking /usr as well. You can work around this by calling fsck with the filesystem to check.

I really like the performance benefits of soft updates, but it seems that I'm
going to have to turn it off on /var because of the problems that eventually
occur.

If these are production boxes, I'd recommend turning it off until you resolve the problem.

If anyone has some advice I'd be grateful.

I don't know if this would qualify as "advice", but since nobody else seems to have any suggestions, I figured I'd throw my thoughts in. Are you using ATA or SCSI drives? Does issuing a manual "sync" once you've stopped the spooling process help any? Are these all identical mobos ... possibly a BIOS update available? These aren't IBM ATA drives are they? I had one of those give me grief for months (if you look in the archives, you should be able to find details on which drives caused problems). Have you tried updating one of the machines to 4.8 to see if the problem has been fixed? Like I said, not good advice, just some ideas for you.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to