[lustre-discuss] ocf:lustre:Lustre resources not happy when ocf:heartbeat:Filesystem ones are

Scott Wood Tue, 06 Mar 2018 19:41:05 -0800

Hi folks,


I've just upgraded a 2.7.0 cluster to 2.10.3 and thought I'd take advantage of 
the new HA resource agents.  Sadly, I find that the resource agent successfully 
mounts the OSDs, then the resource stops (leaving the OSDs mounted).  Here's an 
example case, the management OSD


Created with the following:

# pcs resource create MGT ocf:lustre:Lustre target=/dev/disk/by-label/MGS 
mountpoint=/mnt/MGT; pcs constraint location MGT prefers hpctestmds1=100


Results in the following, leaving the resource stopped but the MGT mounted:

Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32115]: ERROR: 
/dev/disk/by-label/MGS is not mounted
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]:   notice: Result of probe 
operation for MGT on hpctestmds1: 7 (not running)
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32128]: INFO: Starting to 
mount /dev/disk/by-label/MGS
Mar 07 13:28:22 hpctestmds1.our.domain kernel: LDISKFS-fs (sde): mounted 
filesystem with ordered data mode. Opts: 
user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar 07 13:28:22 hpctestmds1.our.domain kernel: Lustre: MGS: Connection restored 
to 9eb39832-a281-1088-d816-410b918b5813 (at 0@lo)
Mar 07 13:28:22 hpctestmds1.our.domain kernel: Lustre: Skipped 6 previous 
similar messages
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32173]: INFO: 
/dev/disk/by-label/MGS mounted successfully
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]:   notice: Result of start 
operation for MGT on hpctestmds1: 0 (ok)
Mar 07 13:28:22 hpctestmds1.our.domain Lustre(MGT)[32189]: ERROR: 
/dev/disk/by-label/MGS is not mounted
Mar 07 13:28:22 hpctestmds1.our.domain crmd[11459]:   notice: Result of stop 
operation for MGT on hpctestmds1: 0 (ok)
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32207]: INFO: Starting to 
mount /dev/disk/by-label/MGS
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32215]: ERROR:  mount failed
Mar 07 13:28:23 hpctestmds1.our.domain Lustre(MGT)[32221]: ERROR: 
/dev/disk/by-label/MGS can not be mounted with this error: 1
Mar 07 13:28:23 hpctestmds1.our.domain lrmd[11456]:   notice: 
MGT_start_0:32200:stderr [ mount.lustre: according to /etc/mtab /dev/sde is 
already mounted on /mnt/MGT ]
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]:   notice: Result of start 
operation for MGT on hpctestmds1: 1 (unknown error)
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]:   notice: 
hpctestmds1-MGT_start_0:558 [ mount.lustre: according to /etc/mtab /dev/sde is 
already mounted on /mnt/MGT\n ]
Mar 07 13:28:23 hpctestmds1.our.domain crmd[11459]:   notice: Result of stop 
operation for MGT on hpctestmds1: 0 (ok)

I then delete the resource, unmount the MGT, and make a new resource with the 
old ocf:heartbeat:Filesystem agent, setting the options to match the defaults 
from the ocf:lustre:Lustre agent, as follows:

# pcs resource create MGT Filesystem device=/dev/disk/by-label/MGS 
directory=/mnt/MGT fstype="lustre" meta op monitor interval="20" timeout="300" 
op start interval="0" timeout="300" op stop interval="0" timeout="300"; pcs 
constraint location MGT prefers hpctestmds1=100


This results in a happier resource start.  Pacemaker resource stays "Started" 
and mount persists.  From journalctl:

Mar 07 13:35:07 hpctestmds1.our.domain crmd[11459]:   notice: Result of probe 
operation for MGT on hpctestmds1: 7 (not running)
Mar 07 13:35:07 hpctestmds1.our.domain Filesystem(MGT)[744]: INFO: Running 
start for /dev/disk/by-label/MGS on /mnt/MGT
Mar 07 13:35:07 hpctestmds1.our.domain kernel: LDISKFS-fs (sde): mounted 
filesystem with ordered data mode. Opts: 
user_xattr,errors=remount-ro,no_mbcache,nodelalloc
Mar 07 13:35:07 hpctestmds1.our.domain kernel: Lustre: MGS: Connection restored 
to 9eb39832-a281-1088-d816-410b918b5813 (at 0@lo)
Mar 07 13:35:07 hpctestmds1.our.domain crmd[11459]:   notice: Result of start 
operation for MGT on hpctestmds1: 0 (ok)

Has anyone experience similar results? Any tips?

Cheers
CanWood

_______________________________________________
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

[lustre-discuss] ocf:lustre:Lustre resources not happy when ocf:heartbeat:Filesystem ones are

Reply via email to