Hi all,
I've got some problems with my setup and I'm trying to understand if I
am missing something or is a bug, here is how to reproduce the error:
node debian-lenny-nodo1
node debian-lenny-nodo2
primitive drbd0 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="20s" timeout="40s" \
op start interval="0" timeout="240s" \
op stop interval="0" timeout="100s"
primitive nfs-common lsb:nfs-common
primitive nfs-kernel-server lsb:nfs-kernel-server
primitive ping ocf:pacemaker:ping \
params host_list="192.168.1.1" name="ping" \
op monitor interval="60s" timeout="60s" \
op start interval="0" timeout="60s"
primitive portmap lsb:portmap
primitive store-LVM ocf:heartbeat:LVM \
params volgrpname="vg_drbd" \
op monitor interval="10s" timeout="30s" \
op start interval="0" timeout="30s" \
op stop interval="0" timeout="30s"
primitive store-exportfs ocf:heartbeat:exportfs \
params directory="/store/share" clientspec="192.168.1.0/24"
options="rw,sync,no_subtree_check,no_root_squash" fsid="1" \
op monitor interval="10s" timeout="30s" \
op start interval="0" timeout="40s" \
op stop interval="0" timeout="40s" \
meta target-role="Started"
primitive store-fs ocf:heartbeat:Filesystem \
params device="/dev/vg_drbd/lv_store" directory="/store" fstype="ext3" \
op monitor interval="20s" timeout="40s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s" \
meta is-managed="true"
primitive store-ip ocf:heartbeat:IPaddr2 \
params ip="192.168.1.53" nic="bond0" \
op monitor interval="20s" timeout="40s"
group nfs portmap nfs-common nfs-kernel-server
group store store-ip store-LVM store-fs store-exportfs
ms ms-drbd0 drbd0 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
clone nfs_clone nfs \
meta globally-unique="false"
clone ping_clone ping \
meta globally-unique="false"
location cli-prefer-store store \
rule $id="cli-prefer-rule-store" inf: #uname eq debian-lenny-nodo1
location store_on_connected_node store \
rule $id="store_on_connected_node-rule" -inf: not_defined ping or ping
lte 0
colocation store_on_ms-drbd0 inf: store ms-drbd0:Master
order store_after_ms-drbd0 inf: ms-drbd0:promote store:start
property $id="cib-bootstrap-options" \
dc-version="1.0.8-2c98138c2f070fcb6ddeab1084154cffbf44ba75" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
last-lrm-refresh="1274949951"
Everything comes up smoothly:
Online: [ debian-lenny-nodo1 debian-lenny-nodo2 ]
Clone Set: ping_clone
Started: [ debian-lenny-nodo1 debian-lenny-nodo2 ]
Master/Slave Set: ms-drbd0
Masters: [ debian-lenny-nodo1 ]
Slaves: [ debian-lenny-nodo2 ]
Resource Group: store
store-ip (ocf::heartbeat:IPaddr2): Started debian-lenny-nodo1
store-LVM (ocf::heartbeat:LVM): Started debian-lenny-nodo1
store-fs (ocf::heartbeat:Filesystem): Started debian-lenny-nodo1
store-exportfs (ocf::heartbeat:exportfs): Started
debian-lenny-nodo1
Clone Set: nfs_clone
Started: [ debian-lenny-nodo2 debian-lenny-nodo1 ]
I mount the share on a network client, with default options, and then
begin to copy with cp command.
The copy goes on and after a while i migrate the group store on the
second node:
crm resource migrate store debian-lenny-nodo2
Everything goes smooth and on the client the copy hangs for a minute or
two, and the restart.
After that, from the client i copy another thing on the nfs storage,
this time with rsync command.
The copy starts and after a while i launch the migration command.
The cluster this time hangs, giving a failure on the filesystem resource:
store-fs (ocf::heartbeat:Filesystem): Started debian-lenny-nodo2
(unmanaged) FAILED
the only way to make things work again is to cleanup the nfs_clone
resource (or restart the nfs-kernel-server daemon) and then cleanup the
store group. It seems that the filesystem is keep opened by the nfs daemon.
So, what's the difference between a simple copy and a rsync? Why with
rsync the fs resource isn't able to unmount the filesystem? There is
something I am missing or this should be an fs resource agent bug?
Here are the logs:
May 27 11:20:41 debian-lenny-nodo1 Filesystem[28197]: INFO: Running stop
for /dev/vg_drbd/lv_store on /store
May 27 11:20:41 debian-lenny-nodo1 Filesystem[28197]: INFO: Trying to
unmount /store
May 27 11:20:41 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy#012umount: /store:
device is busy
May 27 11:20:41 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store; trying cleanup with SIGTERM
May 27 11:20:41 debian-lenny-nodo1 Filesystem[28197]: INFO: No processes
on /store were signalled
May 27 11:20:42 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy#012umount: /store:
device is busy
May 27 11:20:42 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store; trying cleanup with SIGTERM
May 27 11:20:42 debian-lenny-nodo1 Filesystem[28197]: INFO: No processes
on /store were signalled
May 27 11:20:43 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy
May 27 11:20:43 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr)
May 27 11:20:43 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy
May 27 11:20:43 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr)
May 27 11:20:43 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store; trying cleanup with SIGTERM
May 27 11:20:43 debian-lenny-nodo1 Filesystem[28197]: INFO: No processes
on /store were signalled
May 27 11:20:44 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy#012umount: /store:
device is busy
May 27 11:20:44 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store; trying cleanup with SIGKILL
May 27 11:20:44 debian-lenny-nodo1 Filesystem[28197]: INFO: No processes
on /store were signalled
May 27 11:20:45 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy#012umount: /store:
device is busy
May 27 11:20:45 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store; trying cleanup with SIGKILL
May 27 11:20:45 debian-lenny-nodo1 Filesystem[28197]: INFO: No processes
on /store were signalled
May 27 11:20:46 debian-lenny-nodo1 lrmd: [2589]: info: RA output:
(store-fs:stop:stderr) umount: /store: device is busy#012umount: /store:
device is busy
May 27 11:20:46 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store; trying cleanup with SIGKILL
May 27 11:20:46 debian-lenny-nodo1 Filesystem[28197]: INFO: No processes
on /store were signalled
May 27 11:20:47 debian-lenny-nodo1 Filesystem[28197]: ERROR: Couldn't
unmount /store, giving up!
May 27 11:20:48 debian-lenny-nodo1 crmd: [2592]: info:
process_lrm_event: LRM operation store-fs_stop_0 (call=188, rc=1,
cib-update=389, confirmed=true) unknown error
May 27 11:20:48 debian-lenny-nodo1 crmd: [2592]: WARN: status_from_rc:
Action 58 (store-fs_stop_0) on debian-lenny-nodo1 failed (target: 0 vs.
rc: 1): Error
Thanks for your help!
--
RaSca
Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene!
[email protected]
http://www.miamammausalinux.org
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems