Re: [Lustre-discuss] Removing an OST

Nathaniel Rutman Wed, 31 Jan 2007 13:04:22 -0800

Jeremy Mann wrote:

Nathaniel Rutman wrote:

You have to deactivate the OSCs that reference that OST.  From
https://mail.clusterfs.com/wikis/lustre/MountConf:

As of beta7, an OST can be permanently removed from a filesystem. Note
that any files that have stripes on the removed OST will henceforth
return EIO.

mgs> lctl conf_param testfs-OST0001.osc.active=0


Thanks Nathaniel I found it shortly after posting the message. However,
maybe I didn't do it right but I still get error messages about this node.
The steps I took were:

1. umounted /lustre from the frontend
2. umounted /lustre-storage on node17
3. on frontend, ran lctl conf_param bcffs-OST0002.osc.active=0
4. on frontend, remounted bcffs with:

mount -o exclude=bcffs-OST0002 -t lustre [EMAIL PROTECTED]:/bcffs /lustre

All you need is step 3 on the MGS. You don't have to remount clients oruse -o exclude. But that won't hurt anything.

dmesg shows:

Lustre: 31233:0:(quota_master.c:1105:mds_quota_recovery()) Not all osts
are active, abort quota recovery

Apparently quotas don't work with deactivated OSTs. I don't know muchabout this area, but I suspect that to get rid of these messages, you'llneed to have everything active.

Lustre: MDS bcffs-MDT0000: bcffs-OST000c_UUID now active, resetting orphans

This looks like you restarted the MDT.

LustreError: 32542:0:(file.c:1012:ll_glimpse_size()) obd_enqueue returned
rc -5, returning -EIO
Lustre: client 000001017ac3fc00 umount complete

And here the client stopped

Lustre: 1257:0:(obd_mount.c:1675:lustre_check_exclusion()) Excluding
bcffs-OST0002-osc (on exclusion list)
Lustre: 1257:0:(recover.c:231:ptlrpc_set_import_active()) setting import
bcffs-OST0002_UUID INACTIVE by administrator request

That's the -o exlude

Lustre: osc.: set active=0 to 0
LustreError: 1257:0:(lov_obd.c:139:lov_connect_obd()) not connecting OSC
bcffs-OST0002_UUID; administratively disabled

And that's the osc.active=0. You only need one or the other, but bothwon't break anything.

Lustre: Client bcffs-client has started

And here the client restarted

Lustre: 2484:0:(quota_master.c:1105:mds_quota_recovery()) Not all osts are
active, abort quota recovery
Lustre: MDS bcffs-MDT0000: bcffs-OST000d_UUID now active, resetting orphans
Lustre: MGS: haven't heard from client
454dd520-82b9-e3e6-8fcb-800a75807121 (at [EMAIL PROTECTED]) in 228
seconds. I think it's dead, and I am evicting it.

The MGS eviction of an MGC is a non-destructive event, and I should turnoff this scary message. The MGC will re-acquire an MGS lock later.This happened here because (I theorize) that you had a combined MGS/MDTthat you restarted, while other Lustre devices were still mounted on thesame node. The restart of the MGS means that all live MGC's must getkicked out and reconnect.

LustreError: 4834:0:(file.c:1012:ll_glimpse_size()) obd_enqueue returned
rc -5, returning -EIO

This is the only potential worrying error message, but depending on whogenerated it, it might be fine.

Are these normal error messages? I'm asking because I'm about to copy all
of the NCBI databases to the lustre filesystem. I don't want to start it,
then have Lustre crash and have to rebuild everything all over again minus
this node.

You can use the "writeconf" procedure described on the wiki to removeall traces of the removed OST. This does not require reformattinganything and will likely fix the quota message.


_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss

Re: [Lustre-discuss] Removing an OST

Reply via email to