You will have to recompile lustre with the patch in LU-8397. The key for us
was to look at the contents of /proc/fs/lustre/mgc/*/import. Before the patch,
failover_nids from that file was only showing one NID, despite
mkfs.lustre/tunefs.lustre showing multiple service nodes configured. See the
mailing list thread and the LU for more details.
Looking back at this, our problems were related to multirail (using both IB and
TCP). Based on the mkfs.lustre commands you sent in your original email, that
probably isn’t your issue. Just for reference, this is what the mkfs.lustre
command looks like for us.
mkfs.lustre \
--mgsnode=192.52.98.30@tcp0,10.148.0.30@o2ib0 \
--mgsnode=192.52.98.31@tcp0,10.148.0.31@o2ib0 \
--fsname=testfs \
--backfstype=zfs \
--reformat \
--verbose \
--mdt --index=0 \
--servicenode=${LUSTRE_LOCAL_TCP_IP}@tcp0,${LUSTRE_LOCAL_IB_IP}@o2ib0 \
--servicenode=${LUSTRE_PEER_TCP_IP}@tcp0,${LUSTRE_PEER_IB_IP}@o2ib0 \
metadata/meta-test
Looking at this, you used a single --failover instead of a multiple
--servicenode's. The admin manual indicates --servicenode is preferred. You
might try that. I still think looking at the import file I pointed you to
above would be instructive regardless.
From: Ravi Konila <[email protected]>
Reply-To: Ravi Konila <[email protected]>
Date: Thursday, October 26, 2017 at 1:31 AM
To: Darby Vicker <[email protected]>, "Mannthey, Keith"
<[email protected]>, Lustre Discuss <[email protected]>
Subject: Re: [lustre-discuss] MGS is not working in HA
Hi
I am using Lustre 2.8 on RHEL 6.7.
As my application requires RHEL 6.7, I had to use Lustre 2.8.
Any suggestions?
Regards
Ravi Konila
From: Vicker, Darby (JSC-EG311)
Sent: Wednesday, October 25, 2017 11:51 PM
To: Mannthey, Keith ; Ravi Konila ; Lustre Discuss
Subject: Re: [lustre-discuss] MGS is not working in HA
Sorry – I also meant to say that the resolution went off the mailing list and
was continued in LU-8397. You can find the patch there.
From: lustre-discuss <[email protected]> on behalf of
Darby Vicker <[email protected]>
Date: Wednesday, October 25, 2017 at 1:17 PM
To: "Mannthey, Keith" <[email protected]>, Ravi Konila
<[email protected]>, Lustre Discuss <[email protected]>
Subject: Re: [lustre-discuss] MGS is not working in HA
Which version of lustre are you using? We initially has problem with this too
when using failover with lustre 2.8 and 2.9. We got a patch that fixed it and
recent versions work fine for us. We have a combined MGS/MDS so our scenario
is a little different but this sounds very similar to our issue.
http://lists.lustre.org/pipermail/lustre-discuss-lustre.org/2017-January/014125.html
From: lustre-discuss <[email protected]> on behalf of
"Mannthey, Keith" <[email protected]>
Date: Wednesday, October 25, 2017 at 11:30 AM
To: Ravi Konila <[email protected]>, Lustre Discuss
<[email protected]>
Subject: Re: [lustre-discuss] MGS is not working in HA
Kavi,
You may want to open a jira ticket with this error. It looks like the mount
command is only trying only the first nid of the mount command.
Jira is https://jira.hpdd.intel.com “LU” project.
I have seen Lustre Servers first mount behave like this but not client mounts.
It should try the first server, timeout and try the 2nd server.
Thanks,
Keith
From: lustre-discuss [mailto:[email protected]] On Behalf
Of Ravi Konila
Sent: Wednesday, October 25, 2017 5:07 AM
To: Lustre Discuss <[email protected]>
Subject: [lustre-discuss] MGS is not working in HA
Hi
I have two servers for MGS/MDS and have configured it pacemaker for HA.
The command which I gave on first MGS/MDS mds01 is
mkfs.lustre --mgs --failnode 192.168.0.51@o2ib --backfstype=ldiskfs
/dev/mapper/mpathd
Next I created lustre filesystem for MDT
mkfs.lustre --mdt --fsname lhome --index 0 --mgsnode 192.168.0.50@o2ib
--mgsnode 192.168.0.51@o2ib --servicenode 192.168.0.50@o2ib --servicenode
192.168.0.51@o2ib --backfstype=ldiskfs /dev/mapper/mpathb
Now, in my client, If I give
mount –t lustre 192.168.0.50@o2ib:192.168.0.51@o2ib:/lhome /home, it does not
work and asks if MGS is running.
But if I give mount –t lustre 192.168.0.50@o2ib:/lhome /home it works fine.
Also when my first MDS (mds01) is down, my client is not mounting lustre from
2nd MGS.
It says check if MGS is running?
Any help will be highly appreciated.
Regards
Ravi Konila
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org