To follow up on this matter, i've currently set ha/drbd as suggested, formatted the ost's with double mgsserver directives and also mounted with double addresses on the clients, as [EMAIL PROTECTED]:[EMAIL PROTECTED]:/fsname - though, if i fail mgs/mdt 1 it does not recover (in a resonable time), what kinds of tuning/settings will affect this?
//Timh 2008/9/23 Timh Bergström <[EMAIL PROTECTED]>: > Thank you, that's the path i've taken from the last message on this > list, as I misunderstood some of the drbd/ha setups before. However, > using 4 mgsnode "paths", is that recommended or should I use one > mgspath per node and use the other as some sort of manual failover? > > Regards, > Timh > > 2008/9/23 Kevin Van Maren <[EMAIL PROTECTED]>: >> Note that you do not normally use IP takeover with Lustre/Heartbeat: you set >> the failover IP addresses with the mkfs.lustre command, and Lustre >> reconnects to the _other_ address when it is disconnected. >> >> In your case, you would have 2 fixed addresses for each node (w/o heartbeat >> - do NOT use the heartbeat virtual IP addresses), and specify both those >> failover NIDs (rather than just 1). >> >> Lustre1.6 is a bit different from a lot of HA/Heartbeat users: Lustre >> _knows_ about the multiple paths/addresses, and simply requires Heartbeat to >> ensure it is mounted on exactly one node in the failover pair: it does NOT >> rely on IP takeover for HA. >> >> Kevin Van Maren >> >> >> Timh Bergström wrote: >>> >>> 2008/9/23 Brian J. Murrell <[EMAIL PROTECTED]>: >>> >>>> >>>> On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote: >>>> >>>>> >>>>> Hi, >>>>> >>>> >>>> Hi, >>>> >>> >>> Hi again, and thanks for the quick reply! >>> >>> >>>>> >>>>> My (current) modprobe: >>>>> >>>>> options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50 >>>>> >>>> >>>> This syntax is incorrect. For some examples of multi-homed >>>> configurations see the manual at >>>> >>>> http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50642998_20213 >>>> >>> >>> Yes that's the link i've been consulting, perhaps im not looking hard >>> enough. >>> >>> >>>>> >>>>> This is the errors i get: >>>>> LustreError: 10f-e: Error parsing >>>>> 'networks="tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50"' >>>>> >>>> >>>> When you specify "networks" because you specify the interfaces to use, >>>> you don't need to specify the ip address. I think you are confusing the >>>> networks and ipnets options. >>>> >>> >>> The problem here exactly is that the physical interfaces is there, but >>> not with the ip-addresses i want the mdt to "listen" on - the "NIDs", >>> they are added later through heartbeat as aliases (IPaddr2::10.4.21.50 >>> IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd). >>> >>> >>>>> >>>>> LustreError: 110-0: here...............................|---------| >>>>> LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network >>>>> initialisation failed >>>>> (along with a bunch of errors since this module does not load) >>>>> I've tried with tcp0(eth0:0) which fails with about the same error, >>>>> i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine >>>>> ones) but works. >>>>> >>>> >>>> What is the topology exactly? Are there two nics or one nic with two >>>> addresses? Are the two nics on the same physical network or separate >>>> physical networks? >>>> >>> >>> eth0 and eth1 are physical interfaces, they have statically assigned >>> ip's (for management, supervision etc), heartbeat then adds addresses >>> to theese two interfaces if the node is "primary". >>> >>> If it matters - eth0 and eth1 has separated physical paths to >>> everything, this is because we want to survive a physical fail on the >>> network before failing over to another physical server. >>> >>> As I read the manual, i format my OST's with more than one --mgsnode >>> option, which in turn will make the OST "know" about both path's to >>> the MDS/MGS server(s). As in, if first MGS does not work (physical >>> network failure on side A) - try second (Physical side B). >>> >>> What we healthcheck on is the data/disks/server hardware which will >>> tell heartbeat to fail over to server 2 which takes over network path >>> A and network path B (on 10.4.[21,22].50), and the OST's/clients >>> should continue working without noticing. >>> >>> >>>> >>>> b. >>>> >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> [email protected] >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> >>>> >>> >>> >>> >>> >> >> > > > > -- > Timh Bergström > System Administrator > Diino AB - www.diino.com > :wq > -- Timh Bergström System Administrator Diino AB - www.diino.com :wq _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
