Just an overview of events in case someone benefits from them. Last week my IMGate locked, no issues recorded.
Today it locked again, and there were HD I/O errors recorded on the screen. (No wonder nothing got logged) I used my handy, dandy tech tool known as "The Human Ear" and detected a faint "Brrrrr-TICK" (HD Head reset, very common in drive failures). Since the machine was working with reboot, I did a refresh of my backed up configs. They were mostly current, but not 100%, so it was worth it. At the same time I was downloading the latest FreeBSD 4.9 mini-CD. Since the download was going to take a little time, I did a little more prep. 1) I stopped the smtpd service in master.cf and reloaded Postfix. This prevented any more inbound mail so the chance of loss was low. 2) Went through the queue files with mailq and postsuper to kill off any spam bounces or others (turned out that was all I had left). 3) Looked over the notes on the latest Postfix. Never skip this part or you may get bitten. Once I had the CD and a replacement drive, I powered down the IMGate, put it on a work rack, replaced the drive and did a clean install. Yes, I could have put in the drive as a secondary, moved files over, so on and so forth. I did not for two reasons. The failing drive was having read errors, and I did not want to corrupt files in transfer. Both the FreeBSD and Postfix were due for upgrading. The FreeBSD install went quickly because I do the minimums. I killed sendmail ASAP because I knew the defaults for it would be a problem. I would rather have no connection than a bunch of mail bounces. I then copied back the assorted configuration files to the system, but put them into a temporary directory. My rc.conf and loader.conf were then checked for issues with the new version of FreeBSD I made the /etc/postfix directory and restored the proper config and map files to it. The scripts also went to where they are supposed to go. Then I downloaded the latest postfix, built it with PCRE, added the users and settings as required, and finished with a make upgrade. That way the new Postfix used the settings in the main.cf to decide what needed to go where. It fixed a few entries in the main.cf and master.cf. Now I was able to postmap files.... but I forgot to. Instead I turned on "sendmail" in rc.conf, which would start Postfix with the machine, and rebooted to see what would happen. It yelled about the missing .db files, so I postmapped everything, found I needed to make an /user/local/etc/postfix for my SAV/RAV, etc. With postfix stop and start I was able to keep prodding at things. Once it stopped yelling I did tests both inbound and out. They all went through. Total down time? About an hour. It all seems to work great now. I still need to add some cron jobs, but that is nothing much. So, want to be ready for the flood? Keep a copy of your configs and scripts elsewhere. If I had a 100% drive failure, all I would have lost were some changes to RBLs I made recently. Why? Because I do keep a copy of my configs backed up. --Eric
