When an OST first starts, it registers with the MGS and is assigned an
index number. The initial startup order determines the OST indicies.
I'm guessing you started an OST that was assigned number 3, but then
lost/reformatted/something bad happened to that disk, and it
re-registered, getting a new index (6).
There are a few things you can do:
1. You can tell Lustre to forever ignore the "missing" OST0003 by doing
this on the MGS:
> lctl conf_param lustre1-OST0003.osc.active=0
2. If you don't care about your data, you can just reformat everybody
(including the MDT) and start over. This is the only way you'll be
able to get your index numbers back to what you want them to be - you
cannot change an index number once assigned, because Lustre expects to
find certain file objects on certain OSTs. You can use the --index flag
to mkfs.lustre to force a particular index if you want.
3. I lied. If you are sure no files are on OST0005 (use 'lfs find'),
you can reformat just that disk, and use "tunefs.lustre --writeconf" on
the MDT to force regeneration of the configuration files. (See the docs).
Roger L. Smith wrote:
Hmmm, all of the systems are running, but something isn't right.
/proc/fs/lustre/devices on the MDS shows 6 OST's, but there are only 5
of them. Furthermore, the ordering on the OST's is out of whack.
Any ideas on how to correct this?
On MDS:
Lustre-01-01$ cat /proc/fs/lustre/devices
0 UP mgs MGS MGS 15
1 UP mgc [EMAIL PROTECTED] 45b90bd7-b51f-fcc9-9610-4640debeaf74 5
2 UP mdt MDS MDS_uuid 3
3 UP lov lustre1-mdtlov lustre1-mdtlov_UUID 4
4 UP mds lustre1-MDT0000 lustre1-MDT0000_UUID 6
5 UP osc lustre1-OST0000-osc lustre1-mdtlov_UUID 5
6 UP osc lustre1-OST0001-osc lustre1-mdtlov_UUID 5
7 UP osc lustre1-OST0002-osc lustre1-mdtlov_UUID 5
8 UP osc lustre1-OST0003-osc lustre1-mdtlov_UUID 5
9 UP osc lustre1-OST0004-osc lustre1-mdtlov_UUID 5
10 UP osc lustre1-OST0005-osc lustre1-mdtlov_UUID 5
11 UP lov lustre1-clilov-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 4
12 UP mdc lustre1-MDT0000-mdc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
13 UP osc lustre1-OST0000-osc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
14 UP osc lustre1-OST0001-osc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
15 UP osc lustre1-OST0002-osc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
16 UP osc lustre1-OST0003-osc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
17 UP osc lustre1-OST0004-osc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
18 UP osc lustre1-OST0005-osc-000001000c224400
a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5
On OSS's:
Lustre-01-02$ cat /proc/fs/lustre/devices
0 UP mgc [EMAIL PROTECTED] 788e3733-c847-fc92-bd8b-4f75891520b0 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter lustre1-OST0000 lustre1-OST0000_UUID 7
Lustre-01-03$ cat /proc/fs/lustre/devices
0 UP mgc [EMAIL PROTECTED] e324df13-4042-f0e0-a0b1-aefd20cbfcb2 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter lustre1-OST0001 lustre1-OST0001_UUID 7
Lustre-01-04$ cat /proc/fs/lustre/devices
0 UP mgc [EMAIL PROTECTED] 496fbabb-0c40-f069-8e35-fe4bf54ca2bf 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter lustre1-OST0002 lustre1-OST0002_UUID 7
Lustre-01-05$ cat /proc/fs/lustre/devices
0 UP mgc [EMAIL PROTECTED] 8b44355c-c5d7-f1f8-be11-dcc7a620e4d9 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter lustre1-OST0005 lustre1-OST0005_UUID 7
Lustre-01-06$ cat /proc/fs/lustre/devices
0 UP mgc [EMAIL PROTECTED] 59201360-8fe6-f7da-ba49-81a4dc56a705 5
1 UP ost OSS OSS_uuid 3
2 UP obdfilter lustre1-OST0004 lustre1-OST0004_UUID 7
Lustre-01-05 should be lustre1-OST0003 and Lustre-01-06 should be
lustre1-OST0004. Instead, I don't have an 0003, and I've got an 0005,
and the MDS sees more machines than exist.
Nathaniel Rutman wrote:
errno 19 = ENODEV -- did the server lustre-OST0003 successfully start?
Roger L. Smith wrote:
Nathan,
Thanks for the help. That solved one problem, but after booting all
of the servers (no clients at all), I'm getting this in the syslog
on the MDS:
May 11 16:57:11 Lustre-01-01 kernel: LustreError:
6359:0:(client.c:574:ptlrpc_check_status()) @@@ type ==
PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x1450/t0
o8->[EMAIL PROTECTED]@tcp:6 lens 240/272 ref 1 fl
Rpc:R/0/0 rc 0/-19
May 11 16:57:11 Lustre-01-01 kernel: LustreError:
6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous
similar messages
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss