Hi, On Fri, May 28, 2010 at 02:16:22PM +0200, RaSca wrote: > Il giorno Ven 28 Mag 2010 12:34:06 CET, RaSca ha scritto: > [...] > >Note that the nfs-kernel-server isn't connected to the exportfs, but is > >only a cloned resource, so it isn't touched by the migration process. > [...] > > Ok Dejan, > I've patched the Filesystem RA, and here are the configuration changes: > > primitive share-a-fs ocf:heartbeat:Filesystem \ > params device="/dev/drbd0" directory="/share-a" > fstype="ext3" fast_stop="no" \ > op monitor interval="20s" timeout="40s" \ > op start interval="0" timeout="60s" \ > op stop interval="0" timeout="60s" > > I made the same test and the problem remains, from the log I can see > a lot of umount try by the RA, which are unsuccessful: > > ... > ... > May 28 14:09:51 ubuntu-nodo1 lrmd: [704]: info: RA output: > (share-a-fs:stop:stderr) > May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: ERROR: Couldn't > unmount /share-a; trying cleanup with KILL > May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: INFO: No processes on > /share-a were signalled > May 28 14:09:52 ubuntu-nodo1 lrmd: [704]: info: RA output: > (share-a-fs:stop:stderr) umount: /share-a: device is busy.#012 (In > some cases useful info about processes that use#012 > the device is found by lsof(8) or fuser(1)) > ... > ... > > And then: > > May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: share-a-fs:stop > process (PID 9651) timed out (try 1). Killing with signal SIGTERM > (15).
My guess is that the timeout you set is too short. Not sure, but I think that somebody mentioned that it takes at least 80 seconds for the nfsd v4 to really stop. Was nfsd being stopped here at all? Thanks, Dejan > May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: operation stop[191] > on ocf::Filesystem::share-a-fs for client 707, its parameters: > CRM_meta_name=[stop] crm_feature_set=[3.0.1] device=[/dev/drbd0] > CRM_meta_timeout=[60000] directory=[/share-a] fstype=[ext3] > fast_stop=[no] : pid [9651] timed out > May 28 14:10:10 ubuntu-nodo1 crmd: [707]: ERROR: process_lrm_event: > LRM operation share-a-fs_stop_0 (191) Timed Out (timeout=60000ms) > May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: status_from_rc: > Action 16 (share-a-fs_stop_0) on ubuntu-nodo1 failed (target: 0 vs. > rc: -2): Error > May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: update_failcount: > Updating failcount for share-a-fs on ubuntu-nodo1 after failed stop: > rc=-2 (update=INFINITY, time=1275048610) > May 28 14:10:10 ubuntu-nodo1 crmd: [707]: info: > abort_transition_graph: match_graph_event:272 - Triggered transition > abort (complete=0, tag=lrm_rsc_op, id=share-a-fs_stop_0, > magic=2:-2;16:105:0:bd1ff2a9-427b-49a1-9845-5e3e0b91d824, > cib=0.579.6) : Event failed > > The situation is in the end the same as before: > > ... > ... > Resource Group: share-a > share-a-ip (ocf::heartbeat:IPaddr2): Started ubuntu-nodo1 > share-a-fs (ocf::heartbeat:Filesystem): Started ubuntu-nodo1 > (unmanaged) FAILED > share-a-exportfs (ocf::heartbeat:exportfs): Stopped > ... > ... > > What can else i try? > > Thanks a lot, > > -- > RaSca > Mia Mamma Usa Linux: Niente รจ impossibile da capire, se lo spieghi bene! > [email protected] > http://www.miamammausalinux.org _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
