I have a lustre test environment. I'm currently testing network failover. Failover works fine on subnet 1. When I turn off subnet 1 on lustre servers. The clients can't
recover on to subnet 2. Here is the configuration. All the servers and clients are on the same two subnets. I tried mounting the lustre files systems with this command, but the failover to network 2 still failed. mount -t lustre -o flock 10.244.1.120@tcp0:10.244.1.121@tcp0:10.244.2.120@tcp1:10.244.2.121@tcp1:/web fs /imatrix Any ideas? Ed Network ----------- Subnet1 - 10.244.1.0\24 Subnet2 - 10.244.2.0\24 Server1 - 10.244.1.120, 10.244.2.120 Server2 - 10.244.1.121, 10.244.2.121 Server3 - 10.244.1.100, 10.244.2.100 Client1 - 10.244.1.101, 10.244.2.101 Client2 - 10.244.1.102, 10.244.2.102 Client3 - 10.244.1.122, 10.244.2.122 Client4 - 10.244.1.123, 10.244.2.123 Client5 - 10.244.1.250, 10.244.2.250 Lustre Configuration ------------------------- Server1 - mgs webmdt webost1 mailost2 Server2 - mailmdt mailos1 webost2 Server3 - devmdt devost1 # MGS Node on server1 tunefs.lustre --erase-param --failnode=10.244.1.121@tcp0 --writeconf /dev/mapper/lustremgs #MDT nodes on server1 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.121@tcp0 --writeconf /dev/mapper/webmdt #MDT nodes on server2 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.120@tcp0 --writeconf /dev/mapper/mailmdt #MDT nodes on server3 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --writeconf /dev/mapper/devmdt #OST nodes on server1 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.121@tcp0 --param="failover.mode=failout" --writeconf /dev/mapper/webost1 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.121@tcp0 --param="failover.mode=failout" --writeconf /dev/mapper/mailost2 #OST nodes on server2 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.120@tcp0 --param="failover.mode=failout" --writeconf /dev/mapper/webost2 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.120@tcp0 --param="failover.mode=failout" --writeconf /dev/mapper/mailost1 #OST nodes on server3 tunefs.lustre --erase-param --mgsnode=10.244.1.120@tcp0 --mgsnode=10.244.1.121@tcp0 --failnode=10.244.1.121@tcp0 --param="failover.mode=failout" --writeconf /dev/mapper/devost1 LNET entry in modprobe.d/lustre.conf Server1 - options lnet networks=tcp0(bond0),tcp1(bond1) Server2 - options lnet networks=tcp0(bond0),tcp1(bond1) Server3 - options lnet network= tcp0(eth0),tcp1(eth1) Five Clients Client1 - options lnet networks=tcp0(eth0),tcp1(eth1) Client2 - options lnet networks=tcp0(eth0),tcp1(eth1) Client3 - options lnet networks=tcp0(eth0),tcp1(eth1) Client4 - options lnet networks=tcp0(eth0),tcp1(eth1) Client5 - options lnet networks=tcp0(eth0),tcp1(eth1) Mount Command ---------------------- #Mounts on server1 mount -t lustre -o abort_recov /dev/mapper/lustremgs /lustremgs mount -t lustre -o abort_recov /dev/mapper/webmdt /webmst mount -t lustre -o abort_recov /dev/mapper/webost1 /webost1 mount -t lustre -o abort_recov /dev/mapper/mailost2 /mailost2 #Mounts on server2 mount -t lustre -o abort_recov /dev/mapper/webost2 /webost2 mount -t lustre -o abort_recov /dev/mapper/mailmdt /mailmst mount -t lustre -o abort_recov /dev/mapper/mailost1 /mailost1 #Mounts on server3 mount -t lustre -o abort_recov /dev/mapper/devmdt /homemst mount -t lustre -o abort_recov /dev/mapper/devost1 /homeost1 #Client Mounts mount -t lustre -o flock 10.244.1.120@tcp0:10.244.1.121@tcp0:/webfs /imatrix mount -t lustre -o flock 10.244.1.120@tcp0:10.244.1.121@tcp0:/mailfs /var/qmail mount -t lustre -o flock 10.244.1.120@tcp0:10.244.1.121@tcp0:/devfs /home
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
