Folks,
I've read the man page, the vinum debugging page at
<http://www.vinumvm.org/vinum/how-to-debug.html>, searched the
archives of the stable mailing list, and still haven't found anything
that would appear to address the situation we've just started
experiencing.
Many months ago, we had a Diablo USENET news spool/reader server
that was up and operating fine, taking a slave feed from another news
spool server, until such time as I was able to get separate
reader-only servers working.
I had separate /usr/news/history and /usr/news/spool filesystems
set up on their own vinum RAID 1+0 volumes (striped and mirrored
across slices on each of the four logical units that were exported
from the drive array), as well as a set of four spool directories on
their own "raw" slices of those same logical units.
The hardware configuration is a Dell PowerEdge 1300 with dual
450Mhz processors with 1GB of ECC RAM, running 4.1-STABLE (I last did
a make update on Aug 8 16:27, according to the logfiles I keep),
with an IBM DDRS-39130D DC2A drive as the boot disk. No changes
(that I know of) have been made to the system sources since this
time, including vinum.
Unfortunately, we ran into problems with the mail system, and had
to "borrow" half of the drives in the external drive array that was
attached to the news spool server. I copied the data from the spool
directories on the two logical units that would be disappearing to
the other pair of spool directories that would be remaining, and then
I broke the mirrors and pulled out half the drives and moved them
over to the other drive array we have on the mail system.
I was very careful to pull out drives associated with only the
two logical units I had copied the data from, and that this would not
cause any stripes to be broken.
Fast forward to the next millenium, and we finally have the
replacements for the disks in place, and we're starting the process
of trying to revive this machine. Even though we created the slices
and the partitions correctly, we still weren't able to get vinum to
recognize the restored disk devices correctly, and I wasn't aware of
the page at <http://www.vinumvm.org/vinum/replacing-drive.html>.
So, instead, I just copied all the data from the two filesystems
that had been mirrored and striped, and then did a "resetconfig". I
then re-created the vinum configuration with exactly the same
parameters as I had used the first time, and was able to get vinum to
init and bring online the "restored" disk devices and rebuild the
mirrors, etc....
We then put filesystems on everything, and copied all the files
back to their correct locations. So far as we could tell, this
seemed to have worked perfectly, and we were done.
However, at some point within the next couple of hours (we're not
quite sure when), the machine crashed, and vinum failed to come up on
the reboot. We had to comment out the /etc/fstab entries that
referred to the vinum volumes, and since then have been trying to
debug the problems as to why vinum can't find or start any devices at
all.
Following the instructions at
<http://www.vinumvm.org/vinum/how-to-debug.html#config-problems>, I
tried pulling the file log information as specified. The exact
command I used was (line-wrapped for readability):
$ for i in /dev/da2s1e /dev/da2s2e /dev/da3s1e \
/dev/da3s2e /dev/da4s1e /dev/da4s2e /dev/da5s1e \
/dev/da5s2e; do (dd if=$i skip=8 count=6|tr -d \
'\000-\011\200-\377'; echo) >> brad.log; done
The first two disk devices (da2 & da3) produced gibberish. The
last two disk devices (da4 & da5) produced reasonably intelligible
output, but instead of "IN VINO" as the first seven characters, I got
"NO VINO" instead.
A compressed version of the log file available is via my personal
anonymous ftp space -- see
<ftp://ftp.shub-internet.org/pub/shub/brad.log.gz>. Yes, I know this
file is trivially small, but if I compress it then I can make sure
that no one attempts to transfer it in ASCII mode, and that if they
do then they get what they deserve. ;-)
From the time when I did the resetconfig until now, here is the
contents of /var/log/vinum_history:
5 Jan 2001 12:36:58.476142 resetconfig
5 Jan 2001 12:37:04.663644 *** Created devices ***
5 Jan 2001 12:37:10.622895 quit
5 Jan 2001 12:37:19.576839 *** vinum started ***
5 Jan 2001 12:37:19.577725 create -f /etc/vinum.conf
drive d1 device /dev/da2s1e
drive d2 device /dev/da3s1e
drive d3 device /dev/da4s1e
drive d4 device /dev/da5s1e
drive d5 device /dev/da2s2e
drive d6 device /dev/da3s2e
drive d7 device /dev/da4s2e
drive d8 device /dev/da5s2e
volume history
plex org striped 32m
sd length 0 drive d1
sd length 0 drive d2
plex org striped 32m
sd length 0 drive d3
sd length 0 drive d4
volume spool
plex org striped 32m
sd length 0 drive d8
sd length 0 drive d7
plex org striped 32m
sd length 0 drive d6
sd length 0 drive d5
5 Jan 2001 12:37:19.659734 *** Created devices ***
5 Jan 2001 12:37:42.812629 *** vinum started ***
5 Jan 2001 12:37:44.711438 ls -v
5 Jan 2001 12:37:53.114029 start d6
5 Jan 2001 12:37:59.311274 start d5
5 Jan 2001 12:38:07.602677 ls -v
5 Jan 2001 12:38:13.014144 start spool.p1.s1
5 Jan 2001 12:38:26.887609 exit
5 Jan 2001 12:38:28.403601 quit
5 Jan 2001 12:41:32.279534 *** vinum started ***
5 Jan 2001 12:41:33.542050 ls -v
5 Jan 2001 12:41:46.965537 initialize spool.p1.s0
5 Jan 2001 12:41:49.561185 init spool.p1.s0
5 Jan 2001 12:41:54.304397 ls -v
5 Jan 2001 12:42:11.275776 init history.p1.s0
5 Jan 2001 12:42:12.599953 init history.p1.s1
5 Jan 2001 12:42:16.010292 ls -v
5 Jan 2001 12:48:07.540908 *** vinum started ***
5 Jan 2001 12:48:08.955230 ls -v
5 Jan 2001 12:52:45.324897 ls -v
5 Jan 2001 12:38:13.014144 start spool.p1.s1
5 Jan 2001 13:00:18.560459 *** vinum started ***
5 Jan 2001 13:00:20.138779 ls -v
5 Jan 2001 13:00:47.388334 *** vinum started ***
5 Jan 2001 13:00:48.383848 ls -v
5 Jan 2001 16:32:55.861137 *** vinum started ***
5 Jan 2001 16:32:55.869648 start
5 Jan 2001 15:35:02.271176 *** vinum started ***
5 Jan 2001 15:35:03.369950 ls
5 Jan 2001 15:35:04.178968 ls -v
5 Jan 2001 15:36:18.528618 *** vinum started ***
5 Jan 2001 15:36:21.619565 start
5 Jan 2001 15:36:46.720739 ls -v
5 Jan 2001 15:36:49.123128 ls -V
5 Jan 2001 15:37:00.724413 ls -p
5 Jan 2001 15:37:03.958826 ls -P
5 Jan 2001 15:39:24.212310 *** vinum started ***
5 Jan 2001 15:39:24.213161 start
5 Jan 2001 15:53:52.092618 *** vinum started ***
5 Jan 2001 15:53:53.778854 start
5 Jan 2001 15:54:01.699490 quit
5 Jan 2001 16:07:19.846860 *** vinum started ***
5 Jan 2001 16:07:19.847727 makedev
5 Jan 2001 16:07:19.848473 *** Created devices ***
5 Jan 2001 16:07:25.074358 *** vinum started ***
5 Jan 2001 16:07:26.330688 start
5 Jan 2001 16:08:27.987415 *** vinum started ***
5 Jan 2001 16:08:27.988301 read /dev/da2s1e
From the time of the reboot ending with the time vinum was unable
to read the last device, in /var/log/messages I find:
Jan 5 15:53:34 baloo /BALOO: SMP: AP CPU #1 Launched!
Jan 5 15:53:34 baloo /BALOO: Waiting 5 seconds for SCSI devices to settle
Jan 5 15:53:34 baloo /BALOO: da2 at ahc0 bus 0 target 6 lun 0
Jan 5 15:53:34 baloo /BALOO: da2: <HITACHI DF400 0000> Fixed Direct
Access SCSI-2 device
Jan 5 15:53:34 baloo /BALOO: da2: 80.000MB/s transfers (40.000MHz,
offset 15, 16bit), Tagged Queueing Enabled
Jan 5 15:53:34 baloo /BALOO: da2: 68235MB (139745280 512 byte
sectors: 255H 63S/T 8698C)
Jan 5 15:53:34 baloo /BALOO: da4 at ahc1 bus 0 target 5 lun 0
Jan 5 15:53:34 baloo /BALOO: da4: <HITACHI DF400 0000> Fixed Direct
Access SCSI-2 device
Jan 5 15:53:34 baloo /BALOO: da4: 80.000MB/s transfers (40.000MHz,
offset 15, 16bit), Tagged Queueing Enabled
Jan 5 15:53:34 baloo /BALOO: da4: 68235MB (139745280 512 byte
sectors: 255H 63S/T 8698C)
Jan 5 15:53:34 baloo /BALOO: da3 at ahc0 bus 0 target 6 lun 1
Jan 5 15:53:34 baloo /BALOO: da3: <HITACHI DF400 0000> Fixed Direct
Access SCSI-2 device
Jan 5 15:53:34 baloo /BALOO: da3: 80.000MB/s transfers (40.000MHz,
offset 15, 16bit), Tagged Queueing Enabled
Jan 5 15:53:34 baloo /BALOO: da3: 68235MB (139745280 512 byte
sectors: 255H 63S/T 8698C)
Jan 5 15:53:34 baloo /BALOO: da5 at ahc1 bus 0 target 5 lun 1
Jan 5 15:53:34 baloo /BALOO: da5: <HITACHI DF400 0000> Fixed Direct
Access SCSI-2 device
Jan 5 15:53:34 baloo /BALOO: da5: 80.000MB/s transfers (40.000MHz,
offset 15, 16bit), Tagged Queueing Enabled
Jan 5 15:53:34 baloo /BALOO: da5: 68235MB (139745280 512 byte
sectors: 255H 63S/T 8698C)
Jan 5 15:53:34 baloo /BALOO: Mounting root from ufs:/dev/da1s1a
Jan 5 15:53:34 baloo /BALOO: da1 at ahc2 bus 0 target 1 lun 0
Jan 5 15:53:34 baloo /BALOO: da1: <IBM DDRS-39130D DC2A> Fixed
Direct Access SCSI-2 device
Jan 5 15:53:34 baloo /BALOO: da1: 80.000MB/s transfers (40.000MHz,
offset 15, 16bit), Tagged Queueing Enabled
Jan 5 15:53:34 baloo /BALOO: da1: 8715MB (17850000 512 byte sectors:
255H 63S/T 1111C)
Jan 5 15:53:34 baloo /BALOO: vinum: loaded
Jan 5 15:53:34 baloo /BALOO: vinum: reading configuration from /dev/da5s2e
Jan 5 15:53:34 baloo /BALOO: vinum_scandisk: /dev/da5s2e is down
Jan 5 15:53:34 baloo /BALOO: vinum: Can't read device /dev/da5s2e, error 5
Jan 5 15:53:34 baloo /BALOO: vinum: updating configuration from /dev/da2s2e
Jan 5 15:53:34 baloo /BALOO: vinum_scandisk: /dev/da2s2e is down
Jan 5 15:53:34 baloo /BALOO: vinum: Can't read device /dev/da2s2e, error 5
Jan 5 15:53:34 baloo /BALOO: vinum: updating configuration from /dev/da3s1e
Jan 5 15:53:34 baloo /BALOO: vinum_scandisk: /dev/da3s1e is down
Jan 5 15:53:34 baloo /BALOO: vinum: Can't read device /dev/da3s1e, error 5
Jan 5 15:53:34 baloo /BALOO: vinum: updating configuration from /dev/da3s2e
Jan 5 15:53:35 baloo /BALOO: vinum_scandisk: /dev/da3s2e is down
Jan 5 15:53:35 baloo /BALOO: vinum: Can't read device /dev/da3s2e, error 5
Jan 5 15:53:35 baloo /BALOO: vinum: updating configuration from /dev/da4s1e
Jan 5 15:53:35 baloo /BALOO: vinum_scandisk: /dev/da4s1e is down
Jan 5 15:53:35 baloo /BALOO: vinum: Can't read device /dev/da4s1e, error 5
Jan 5 15:53:35 baloo /BALOO: vinum: updating configuration from /dev/da4s2e
Jan 5 15:53:35 baloo /BALOO: vinum_scandisk: /dev/da4s2e is down
Jan 5 15:53:35 baloo /BALOO: vinum: Can't read device /dev/da4s2e, error 5
Jan 5 15:53:35 baloo /BALOO: vinum: updating configuration from /dev/da5s1e
Jan 5 15:53:35 baloo /BALOO: vinum_scandisk: /dev/da5s1e is down
Jan 5 15:53:35 baloo /BALOO: vinum: Can't read device /dev/da5s1e, error 5
Jan 5 15:53:35 baloo /BALOO: vinum: updating configuration from /dev/da2s1e
Jan 5 15:53:35 baloo /BALOO: vinum_scandisk: /dev/da2s1e is down
Jan 5 15:53:35 baloo /BALOO: vinum: Can't read device /dev/da2s1e, error 5
Jan 5 15:53:35 baloo /BALOO: vinum: couldn't read configuration
Unfortunately, I'm at a loss as to understand where to go next in
terms of debugging this matter. Any and all help you can provide is
appreciated.
Thanks!
--
These are my opinions -- not to be taken as official Skynet policy
======================================================================
Brad Knowles, <[EMAIL PROTECTED]> || Belgacom Skynet SA/NV
Systems Architect, Mail/News/FTP/Proxy Admin || Rue Colonel Bourg, 124
Phone/Fax: +32-2-706.13.11/12.49 || B-1140 Brussels
http://www.skynet.be || Belgium
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
-Benjamin Franklin, Historical Review of Pennsylvania.
To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-stable" in the body of the message