Thank You. The problem was at the pacemaker level. Solved by article http://blog.clusterlabs.org/blog/2009/why-wont-the-cluster-start-my-services (like crm_resource --cleanup --node nagios-clu2 )
2018-03-27 9:35 GMT+04:00 Igor Cicimov <[email protected]>: > Hi, > > On Fri, Mar 23, 2018 at 9:01 AM, Lozenkov Sergei <[email protected]> > wrote: > >> Hello. >> I have two Debian 9 servers with configured Corosync-Pacemaker-DRBD. >> All work well for month. >> After some servers issues (with reboots) I have situation that pacemaker >> could not switch drbd node with such errors: >> >> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice: >> operation_finished: drbd_nfs_stop_0:3667:stderr [ 1: State change >> failed: (-12) Device is held open by someone ] >> >> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: notice: >> operation_finished: drbd_nfs_stop_0:3667:stderr [ Command 'drbdsetup-84 >> secondary 1' terminated with exit code 11 ] >> >> Mar 16 06:25:11 [877] nfs01-az-eus.tech-corps.com lrmd: info: >> log_finished: finished - rsc:drbd_nfs action:stop call_id:47 pid:3667 >> exit-code:1 exec-time:20002ms queue-time:0ms >> >> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: error: >> process_lrm_event: Result of stop operation for drbd_nfs on >> nfs01-az-eus.tech-corps.com: Timed Out | call=47 key=drbd_nfs_stop_0 >> timeout=20000ms >> >> Mar 16 06:25:11 [880] nfs01-az-eus.tech-corps.com crmd: notice: >> process_lrm_event: nfs01-az-eus.tech-corps.com-drbd_nfs_stop_0:47 [ >> 1: State change failed: (-12) Device is held open by someone\nCommand >> 'drbdsetup-84 secondary 1' terminated with exit code 11\n1: State change >> failed: (-12) Device is held open by someone\nCommand 'drbdsetup-84 >> secondary 1' terminated with exit code 11\n1: State change failed: (-12) >> Device is held open by someone\nCommand 'drbdsetup-84 secondary 1' >> terminated with exit >> >> I tried to resolve the issue with many googled receipts but all attempts >> were unsuccessful. >> As well I have another two node cluster with exactly the same >> configuration and it works without any issues. >> >> Right now I placed nodes to standby mode and manually raised all >> services. >> Please, could You help me to analyze and solve the problem? >> Thanks >> >> Here are my configuration files: >> --- CRM CONFIG --- >> crm configure show >> node 171049224: nfs01-az-eus.tech-corps.com \ >> attributes standby=off >> node 171049225: nfs02-az-eus.tech-corps.com \ >> attributes standby=on >> primitive drbd_nfs ocf:linbit:drbd \ >> params drbd_resource=nfs \ >> op monitor interval=29s role=Master \ >> op monitor interval=31s role=Slave >> primitive fs_nfs Filesystem \ >> params device="/dev/drbd1" directory="/data" fstype=ext4 \ >> meta is-managed=true >> primitive nfs lsb:nfs-kernel-server \ >> op monitor interval=5s >> primitive nmbd lsb:nmbd \ >> op monitor interval=5s >> primitive smbd lsb:smbd \ >> op monitor interval=5s >> group NFS fs_nfs nfs nmbd smbd >> ms ms_drbd_nfs drbd_nfs \ >> meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 >> notify=true >> order fs-nfs-before-nfs inf: fs_nfs:start nfs:start >> order fs-nfs-before-nmbd inf: fs_nfs:start nmbd:start >> order fs-nfs-before-smbd inf: fs_nfs:start smbd:start >> order ms-drbd-nfs-before-fs-nfs inf: ms_drbd_nfs:promote fs_nfs:start >> colocation ms-drbd-nfs-with-ha inf: ms_drbd_nfs:Master NFS >> order nmbd-before-smbd inf: nmbd:start smbd:start >> property cib-bootstrap-options: \ >> have-watchdog=false \ >> dc-version=1.1.16-94ff4df \ >> cluster-infrastructure=corosync \ >> cluster-name=debian \ >> stonith-enabled=false \ >> no-quorum-policy=ignore >> >> >> >> --- DRBD GLOBAL --- >> cat /etc/drbd.d/global_common.conf | grep -v '#' >> >> global { >> usage-count no; >> } >> >> common { >> protocol C; >> >> handlers { >> >> } >> >> startup { >> } >> >> options { >> } >> >> disk { >> } >> >> net { >> } >> } >> >> >> --- DRBD -RESOURCE --- >> cat /etc/drbd.d/nfs.res | grep -v '#' >> resource nfs{ >> meta-disk internal; >> device /dev/drbd1; >> syncer { >> verify-alg sha1; >> rate 100M; >> } >> >> net{ >> max-buffers 8000; >> max-epoch-size 8000; >> unplug-watermark 16; >> sndbuf-size 0; >> } >> >> disk{ >> disk-barrier no; >> disk-flushes no; >> } >> >> on nfs01-az-eus.tech-corps.com{ >> disk /dev/sdc1; >> address 10.50.1.8:7789; >> } >> >> on nfs02-az-eus.tech-corps.com{ >> disk /dev/sdc1; >> address 10.50.1.9:7789; >> } >> } >> >> >> >> >> -- >> Segey L >> >> _______________________________________________ >> drbd-user mailing list >> [email protected] >> http://lists.linbit.com/mailman/listinfo/drbd-user >> >> > Did you check with fuser what is holding the device/filesystem busy? > > -- Лозенков Сергей
_______________________________________________ drbd-user mailing list [email protected] http://lists.linbit.com/mailman/listinfo/drbd-user
