Note that you do not normally use IP takeover with Lustre/Heartbeat: you set the failover IP addresses with the mkfs.lustre command, and Lustre reconnects to the _other_ address when it is disconnected.
In your case, you would have 2 fixed addresses for each node (w/o heartbeat - do NOT use the heartbeat virtual IP addresses), and specify both those failover NIDs (rather than just 1). Lustre1.6 is a bit different from a lot of HA/Heartbeat users: Lustre _knows_ about the multiple paths/addresses, and simply requires Heartbeat to ensure it is mounted on exactly one node in the failover pair: it does NOT rely on IP takeover for HA. Kevin Van Maren Timh Bergström wrote: > 2008/9/23 Brian J. Murrell <[EMAIL PROTECTED]>: > >> On Tue, 2008-09-23 at 15:06 +0200, Timh Bergström wrote: >> >>> Hi, >>> >> Hi, >> > Hi again, and thanks for the quick reply! > > >>> My (current) modprobe: >>> >>> options lnet networks=tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50 >>> >> This syntax is incorrect. For some examples of multi-homed >> configurations see the manual at >> http://manual.lustre.org/manual/LustreManual16_HTML/MoreComplicatedConfigurations.html#50642998_20213 >> > > Yes that's the link i've been consulting, perhaps im not looking hard enough. > > >>> This is the errors i get: >>> LustreError: 10f-e: Error parsing >>> 'networks="tcp0(eth0)10.4.21.50,tcp1(eth1)10.4.22.50"' >>> >> When you specify "networks" because you specify the interfaces to use, >> you don't need to specify the ip address. I think you are confusing the >> networks and ipnets options. >> > > The problem here exactly is that the physical interfaces is there, but > not with the ip-addresses i want the mdt to "listen" on - the "NIDs", > they are added later through heartbeat as aliases (IPaddr2::10.4.21.50 > IPaddr2::10.4.22.50), but before mounting the mdt-resource (drbd). > > >>> LustreError: 110-0: here...............................|---------| >>> LustreError: 4527:0:(events.c:707:ptlrpc_init_portals()) network >>> initialisation failed >>> (along with a bunch of errors since this module does not load) >>> >>> I've tried with tcp0(eth0:0) which fails with about the same error, >>> i've tried tcp0(eth0,eth1) which gives me the wrong addresses (machine >>> ones) but works. >>> >> What is the topology exactly? Are there two nics or one nic with two >> addresses? Are the two nics on the same physical network or separate >> physical networks? >> > > eth0 and eth1 are physical interfaces, they have statically assigned > ip's (for management, supervision etc), heartbeat then adds addresses > to theese two interfaces if the node is "primary". > > If it matters - eth0 and eth1 has separated physical paths to > everything, this is because we want to survive a physical fail on the > network before failing over to another physical server. > > As I read the manual, i format my OST's with more than one --mgsnode > option, which in turn will make the OST "know" about both path's to > the MDS/MGS server(s). As in, if first MGS does not work (physical > network failure on side A) - try second (Physical side B). > > What we healthcheck on is the data/disks/server hardware which will > tell heartbeat to fail over to server 2 which takes over network path > A and network path B (on 10.4.[21,22].50), and the OST's/clients > should continue working without noticing. > > >> b. >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> > > > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
