Thomas Roth wrote:
Hi all,

after failure of a server contributing two OSTs to our lustre fs, I'm
having trouble either getting rid of these OSTs for good or
re-introducing them. (It's a test system, the data on it may be thrown
away anytime if necessary). System is running Debian Etch, kernel
2.6.20, Lustre 1.6.0.1

Trying to mount the OSTs invariably gives me

 kernel: LustreError: Trying to start OBD testfs1-OST000b_UUID using the
wrong disk . Were the /dev/ assignments rearranged?
 kernel: LustreError: 7792:0:(filter.c:1008:filter_prep()) cannot read
last_rcvd: rc = -22
...
"the wrong disk ." -- the missing disk name implies the last_rcvd file has been corrupted. (The -22 EINVAL is a consequence of that.) You could try mounting the disk as type ldiskfs, then erasing the last_rcvd file - this should cause the OST to regenerate it.

the log messages following these lines are a consequence of these, I guess.
Although I'm not sure what may be  /dev/ assignments, nothing has been
changed on this machine - just a reboot (and maybe a damaged partition,
of course)
I also havn't found the meaning of the code -22 ?

I went on trying to unregister these OSTs on the MGS. The Lustre manual
says
$ mgs> lctl conf_param testfs-OST0001.osc.active=0
This doesn't work, as do most of the examples given in
http://manual.lustre.org/manual/LustreManual16_HTML - for which Lustre
version was this manual written? 'man lctl' tells me that the --device
option may be missing. On the MGS, I got
$ mgs> lctl dl
...
 19 UP osc testfs1-OST000a-osc testfs1-mdtlov_UUID 5
 20 UP osc testfs1-OST000b-osc testfs1-mdtlov_UUID 5

(Something else that I'm missing painfully in all the Lustre
documentation: explanation of output of commands!)
My guess was the correct name for my OSTs is given in the fourth field,
so I tried
$ mgs> lctl --device testfs1-OST000a-osc conf_param
testfs1-OST000b.osc.active=0

This at least didn't give me an error. The output of 'lctl dl' did not
change, however., 19 and 20 still there and UP.

$ mgs> lctl --device testfs1-OST000a-osc deactivate
had the same result.

Still, I went on to the OSS and tried
$ oss> tunefs.lustre --erase-params --fsname=testfs1  --ost
[EMAIL PROTECTED]   /dev/sdb1
which doesn't work because of
tunefs.lustre: cannot change the name of a registered target
tunefs.lustre: exiting with 1 (Operation not permitted)

$ oss> tunefs.lustre --writeconf --erase-params --fsname=testfs1  --ost
[EMAIL PROTECTED] /dev/sdb1
works fine, but mounting the partition results in exactly the same error
messages in the syslog as before.

So far I have not tried reformatting these partitions. But I think I
should ask the experts here about all the mistakes I made.

Many thanks.
Thomas




_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to