Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync

Dejan Muhamedagic Mon, 31 May 2010 04:14:22 -0700

Hi,

On Fri, May 28, 2010 at 02:16:22PM +0200, RaSca wrote:
> Il giorno Ven 28 Mag 2010 12:34:06 CET, RaSca ha scritto:
> [...]
> >Note that the nfs-kernel-server isn't connected to the exportfs, but is
> >only a cloned resource, so it isn't touched by the migration process.
> [...]
> 
> Ok Dejan,
> I've patched the Filesystem RA, and here are the configuration changes:
> 
> primitive share-a-fs ocf:heartbeat:Filesystem \
>         params device="/dev/drbd0" directory="/share-a"
> fstype="ext3" fast_stop="no" \
>         op monitor interval="20s" timeout="40s" \
>         op start interval="0" timeout="60s" \
>         op stop interval="0" timeout="60s"
> 
> I made the same test and the problem remains, from the log I can see
> a lot of umount try by the RA, which are unsuccessful:
> 
> ...
> ...
> May 28 14:09:51 ubuntu-nodo1 lrmd: [704]: info: RA output:
> (share-a-fs:stop:stderr)
> May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: ERROR: Couldn't
> unmount /share-a; trying cleanup with KILL
> May 28 14:09:51 ubuntu-nodo1 Filesystem[9651]: INFO: No processes on
> /share-a were signalled
> May 28 14:09:52 ubuntu-nodo1 lrmd: [704]: info: RA output:
> (share-a-fs:stop:stderr) umount: /share-a: device is busy.#012 (In
> some cases useful info about processes that use#012
>   the device is found by lsof(8) or fuser(1))
> ...
> ...
> 
> And then:
> 
> May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: share-a-fs:stop
> process (PID 9651) timed out (try 1).  Killing with signal SIGTERM
> (15).


My guess is that the timeout you set is too short. Not sure, but
I think that somebody mentioned that it takes at least 80 seconds
for the nfsd v4 to really stop. Was nfsd being stopped here at
all?

Thanks,

Dejan

> May 28 14:10:10 ubuntu-nodo1 lrmd: [704]: WARN: operation stop[191]
> on ocf::Filesystem::share-a-fs for client 707, its parameters:
> CRM_meta_name=[stop] crm_feature_set=[3.0.1] device=[/dev/drbd0]
> CRM_meta_timeout=[60000] directory=[/share-a] fstype=[ext3]
> fast_stop=[no] : pid [9651] timed out
> May 28 14:10:10 ubuntu-nodo1 crmd: [707]: ERROR: process_lrm_event:
> LRM operation share-a-fs_stop_0 (191) Timed Out (timeout=60000ms)
> May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: status_from_rc:
> Action 16 (share-a-fs_stop_0) on ubuntu-nodo1 failed (target: 0 vs.
> rc: -2): Error
> May 28 14:10:10 ubuntu-nodo1 crmd: [707]: WARN: update_failcount:
> Updating failcount for share-a-fs on ubuntu-nodo1 after failed stop:
> rc=-2 (update=INFINITY, time=1275048610)
> May 28 14:10:10 ubuntu-nodo1 crmd: [707]: info:
> abort_transition_graph: match_graph_event:272 - Triggered transition
> abort (complete=0, tag=lrm_rsc_op, id=share-a-fs_stop_0,
> magic=2:-2;16:105:0:bd1ff2a9-427b-49a1-9845-5e3e0b91d824,
> cib=0.579.6) : Event failed
> 
> The situation is in the end the same as before:
> 
> ...
> ...
>  Resource Group: share-a
>      share-a-ip       (ocf::heartbeat:IPaddr2):       Started ubuntu-nodo1
>      share-a-fs       (ocf::heartbeat:Filesystem):    Started ubuntu-nodo1
> (unmanaged) FAILED
>      share-a-exportfs (ocf::heartbeat:exportfs):      Stopped
> ...
> ...
> 
> What can else i try?
> 
> Thanks a lot,
> 
> -- 
> RaSca
> Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
> [email protected]
> http://www.miamammausalinux.org
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problem with migration on a nfs/exportfs setup while copying via rsync

Reply via email to