Hi,

If you refer to my previous message, you will see that I have two multihomed clusters, each having Lustre servers and clients. I have clients mounting lustre partitions from o2ib and tcp. Now I am inplementing failover, did a try this morning without success, so RTFM. I read:

Note -- If you have an MGS or MDT configured for failover, perform these steps:
1. On the OST, list the NIDs of all MGS nodes at mkfs time.
OST# mkfs.lustre --fsname sunfs --ost --mgsnode=10.0.0.1
--mgsnode=10.0.0.2 /dev/{device}
2. On the client, mount the file system.
client# mount -t lustre 10.0.0.1:10.0.0.2:/sunfs /cfs/client/

So I extended the logic from :

mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s@o2ib3 <mailto:--failnode%3Dib4-st02s@o2ib4> --reformat /dev/mpath/emcssd-1 mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s@o2ib3 --mgsnode=ib3-st01e@tcp --failnode=ib3-st02s@o2ib3 <mailto:--failnode%3Dib4-st02s@o2ib4> /dev/mpath/colosse4-lun54-sata

to:

mkfs.lustre --mgs --mdt --fsname=sata --failnode=ib3-st02s@o2ib3,ib3-st02e@tcp --reformat /dev/mpath/emcssd-1 mkfs.lustre --fsname sata --reformat --ost --mgsnode=ib3-st01s@o2ib3,ib3-st01e@tcp --mgsnode=ib3-st02s@o2ib3,ib3-st02e@tcp --failnode=ib3-st02s@o2ib3,ib3-st02e@tcp /dev/mpath/colosse4-lun53-sata

And so on for other  disks.

Partitions mounts great on the MDS/MGS/OSS server, but on the OSS only, I have:

[root@ib3-st03 ~]# mount -t lustre /dev/mpath/colosse4-lun55-sata /mnt/data/clun55 mount.lustre: mount /dev/mpath/colosse4-lun55-sata at /mnt/data/clun55 failed: Interrupted system call

messages file contains:

Dec 21 15:18:52 ib3-st03 kernel: Lustre: 9464:0:(client.c:1487:ptlrpc_expire_one_request()) @@@ Request x1388814699331655 sent from MGC10.10.135.115@o2ib3 to NID 10.10.135.116@o2ib3 5s ago has timed out (5s prior to deadline). Dec 21 15:18:52 ib3-st03 kernel: req@ffff810116fff800 x1388814699331655/t0 o250->[email protected]@o2ib3_1:26/25 lens 368/584 e 0 to 1 dl 1324480732 ref 1 fl Rpc:N/0/0 rc 0/0 Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:1112:server_start_targets()) Required registration failed for sata-OSTffff: -4 Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:1670:server_fill_super()) Unable to start targets: -4 Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:1453:server_put_super()) no obd sata-OSTffff Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:147:server_deregister_mount()) sata-OSTffff not registered
Dec 21 15:18:52 ib3-st03 kernel: Lustre: server umount sata-OSTffff complete
Dec 21 15:18:52 ib3-st03 kernel: LustreError: 23519:0:(obd_mount.c:2065:lustre_fill_super()) Unable to mount (-4)


so my question is?

What would ne the correct syntax to make sure I have a failover on the o2ib clients as well as the tcp clients?

Thanks




--
Patrice Hamelin
Specialiste sénior en systèmes d'exploitation | Senior OS specialist
Environnement Canada | Environment Canada
2121, route Transcanadienne | 2121 Transcanada Highway
Dorval, QC H9P 1J3
Téléphone | Telephone 514-421-5303
Télécopieur | Facsimile 514-421-7231
Gouvernement du Canada | Government of Canada

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to