Il giorno Ven 28 Mag 2010 12:34:06 CET, RaSca ha scritto:
[...]
> Note that the nfs-kernel-server isn't connected to the exportfs, but is
> only a cloned resource, so it isn't touched by the migration process.
[...]
Ok Dejan,
I've patched the Filesystem RA, and here are the configuration changes:
primitive share-a-fs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/share-a" fstype="ext3"
fast_stop="no" \
op monitor interval="20s" timeout="40s" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
I made the same test and the problem remains, from the log I can see a
lot of umount try by the RA, which are unsuccessful:
...
...
May 28 14:09:51 ubuntu-nodo1 lrmd: [704]: info: RA output:
(share-a-fs:stop:stderr)
May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: ERROR: Couldn't unmount
/share-a; trying cleanup with KILL
May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: INFO: No processes on
/share-a were signalled
May 28 14:09:52 ubuntu-nodo1 lrmd: [704]: info: RA output:
(share-a-fs:stop:stderr) umount: /share-a: device is busy.#012
(In some cases useful info about processes that use#012
the device is found by lsof(8) or fuser(1))
...
...
And then:
May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: share-a-fs:stop process
(PID 9651) timed out (try 1). Killing with signal SIGTERM (15).
May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: operation stop[191] on
ocf::Filesystem::share-a-fs for client 707, its parameters:
CRM_meta_name=[stop] crm_feature_set=[3.0.1] device=[/dev/drbd0]
CRM_meta_timeout=[60000] directory=[/share-a] fstype=[ext3]
fast_stop=[no] : pid [9651] timed out
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: ERROR: process_lrm_event: LRM
operation share-a-fs_stop_0 (191) Timed Out (timeout=60000ms)
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: status_from_rc: Action
16 (share-a-fs_stop_0) on ubuntu-nodo1 failed (target: 0 vs. rc: -2): Error
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: update_failcount:
Updating failcount for share-a-fs on ubuntu-nodo1 after failed stop:
rc=-2 (update=INFINITY, time=1275048610)
May 28 14:10:10 ubuntu-nodo1 crmd: [707]: info: abort_transition_graph:
match_graph_event:272 - Triggered transition abort (complete=0,
tag=lrm_rsc_op, id=share-a-fs_stop_0,
magic=2:-2;16:105:0:bd1ff2a9-427b-49a1-9845-5e3e0b91d824, cib=0.579.6) :
Event failed
The situation is in the end the same as before:
...
...
Resource Group: share-a
share-a-ip (ocf::heartbeat:IPaddr2): Started ubuntu-nodo1
share-a-fs (ocf::heartbeat:Filesystem): Started ubuntu-nodo1
(unmanaged) FAILED
share-a-exportfs (ocf::heartbeat:exportfs): Stopped
...
...
What can else i try?
Thanks a lot,
--
RaSca
Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene!
[email protected]
http://www.miamammausalinux.org
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems