When an OST first starts, it registers with the MGS and is assigned an index number. The initial startup order determines the OST indicies. I'm guessing you started an OST that was assigned number 3, but then lost/reformatted/something bad happened to that disk, and it re-registered, getting a new index (6).
There are a few things you can do:
1. You can tell Lustre to forever ignore the "missing" OST0003 by doing this on the MGS:
> lctl conf_param lustre1-OST0003.osc.active=0
2. If you don't care about your data, you can just reformat everybody (including the MDT) and start over. This is the only way you'll be able to get your index numbers back to what you want them to be - you cannot change an index number once assigned, because Lustre expects to find certain file objects on certain OSTs. You can use the --index flag to mkfs.lustre to force a particular index if you want. 3. I lied. If you are sure no files are on OST0005 (use 'lfs find'), you can reformat just that disk, and use "tunefs.lustre --writeconf" on the MDT to force regeneration of the configuration files. (See the docs).
Roger L. Smith wrote:
Hmmm, all of the systems are running, but something isn't right. /proc/fs/lustre/devices on the MDS shows 6 OST's, but there are only 5 of them. Furthermore, the ordering on the OST's is out of whack.

Any ideas on how to correct this?

On MDS:

Lustre-01-01$ cat /proc/fs/lustre/devices
  0 UP mgs MGS MGS 15
  1 UP mgc [EMAIL PROTECTED] 45b90bd7-b51f-fcc9-9610-4640debeaf74 5
  2 UP mdt MDS MDS_uuid 3
  3 UP lov lustre1-mdtlov lustre1-mdtlov_UUID 4
  4 UP mds lustre1-MDT0000 lustre1-MDT0000_UUID 6
  5 UP osc lustre1-OST0000-osc lustre1-mdtlov_UUID 5
  6 UP osc lustre1-OST0001-osc lustre1-mdtlov_UUID 5
  7 UP osc lustre1-OST0002-osc lustre1-mdtlov_UUID 5
  8 UP osc lustre1-OST0003-osc lustre1-mdtlov_UUID 5
  9 UP osc lustre1-OST0004-osc lustre1-mdtlov_UUID 5
 10 UP osc lustre1-OST0005-osc lustre1-mdtlov_UUID 5
11 UP lov lustre1-clilov-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 4 12 UP mdc lustre1-MDT0000-mdc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 13 UP osc lustre1-OST0000-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 14 UP osc lustre1-OST0001-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 15 UP osc lustre1-OST0002-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 16 UP osc lustre1-OST0003-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 17 UP osc lustre1-OST0004-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5 18 UP osc lustre1-OST0005-osc-000001000c224400 a5a32bfb-05ba-d0cc-f4dd-6afbc5d45224 5


On OSS's:

Lustre-01-02$ cat /proc/fs/lustre/devices
  0 UP mgc [EMAIL PROTECTED] 788e3733-c847-fc92-bd8b-4f75891520b0 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre1-OST0000 lustre1-OST0000_UUID 7

Lustre-01-03$ cat /proc/fs/lustre/devices
  0 UP mgc [EMAIL PROTECTED] e324df13-4042-f0e0-a0b1-aefd20cbfcb2 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre1-OST0001 lustre1-OST0001_UUID 7

Lustre-01-04$ cat /proc/fs/lustre/devices
  0 UP mgc [EMAIL PROTECTED] 496fbabb-0c40-f069-8e35-fe4bf54ca2bf 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre1-OST0002 lustre1-OST0002_UUID 7

Lustre-01-05$ cat /proc/fs/lustre/devices
  0 UP mgc [EMAIL PROTECTED] 8b44355c-c5d7-f1f8-be11-dcc7a620e4d9 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre1-OST0005 lustre1-OST0005_UUID 7

Lustre-01-06$ cat /proc/fs/lustre/devices
  0 UP mgc [EMAIL PROTECTED] 59201360-8fe6-f7da-ba49-81a4dc56a705 5
  1 UP ost OSS OSS_uuid 3
  2 UP obdfilter lustre1-OST0004 lustre1-OST0004_UUID 7

Lustre-01-05 should be lustre1-OST0003 and Lustre-01-06 should be lustre1-OST0004. Instead, I don't have an 0003, and I've got an 0005, and the MDS sees more machines than exist.



Nathaniel Rutman wrote:
errno 19 = ENODEV -- did the server lustre-OST0003 successfully start?


Roger L. Smith wrote:
Nathan,

Thanks for the help. That solved one problem, but after booting all of the servers (no clients at all), I'm getting this in the syslog on the MDS:



May 11 16:57:11 Lustre-01-01 kernel: LustreError: 6359:0:(client.c:574:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -19 [EMAIL PROTECTED] x1450/t0 o8->[EMAIL PROTECTED]@tcp:6 lens 240/272 ref 1 fl Rpc:R/0/0 rc 0/-19 May 11 16:57:11 Lustre-01-01 kernel: LustreError: 6359:0:(client.c:574:ptlrpc_check_status()) Skipped 24 previous similar messages


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Reply via email to