Re: [Linux-HA] debugging resource configuration

Eric Schoeller Thu, 28 Oct 2010 17:38:43 -0700

Just a shot in the dark here kind of ... but I know that when I had this 
type of problem with a stonith device it was timeout related. You could 
try boosting your timeouts all around, or even check what


# time /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf start

reports back.

If timeouts aren't it, I would start breaking out parts of the cluster 
config and trying it again until it works, then add those parts back in 
(colocation and order) until you figure out what's breaking it.

--Eric

Greg Woods wrote:
> This is a continuation of trying to get ldirectord working under
> pacemaker. I have a working installation of ldirectord. I know this
> because if I manually configure the eth0:0 pseudo-interface with the
> virtual server address, and manually start ldirectord with
>
> # /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf start
>
> ...then everything works. I can connect to the virtual service address
> and port, and I get properly redirected to one of the real servers.
> ipvsadm shows normal output. All looks good.
>
> However, if I try to start the ldirectord resource, it starts, then
> fails, then starts, then fails, etc. This will continue until I issue a
> "resource ldirectord stop" command in the CRM shell. 
>
> So it has to be something with how I configured it, but I'm damned if I
> can figure it out. Here is what I have that involves this resource:
>
> primitive ldirectord ocf:heartbeat:ldirectord \
>         op start interval="20" timeout="15" \
>         op stop interval="20" timeout="15" \
>         op monitor interval="20" timeout="20" \
> colocation vdir-ipi-with-ldirectord inf: vdir-ipi ldirectord
> order vdir-ipi-before-ldirectord inf: vdir-ipi ldirectord
>
> The vdir-ipi is an IPAddr resource that will start fine and results in
> the eth0:0 alias interface being configured and brought up.
>
> When I issue a "resource start ldirectord" command from the crm shell,
> what I get from lrmd is repeats of this sequence:
>
> Oct 28 18:12:24 vmx1.ucar.edu lrmd: [4842]: info: rsc:vdir-ipi:5464:
> start
> Oct 28 18:12:24 vmx1.ucar.edu lrmd: [4842]: info: Managed vdir-ipi:start
> process 4923 exited with return code 0.
>
> Oct 28 18:12:25 vmx1.ucar.edu lrmd: [4842]: info: rsc:ldirectord:5466:
> start
> Oct 28 18:12:25 vmx1.ucar.edu lrmd: [4842]: info: RA output:
> (ldirectord:start:stdout) /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf
> start
> Oct 28 18:12:26 vmx1.ucar.edu lrmd: [4842]: info: Managed
> ldirectord:start process 5103 exited with return code 0.
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: rsc:ldirectord:5467:
> start
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: perform_op:2906:
> operation start[5467] on ocf::ldirectord::ldirectord for client 4845,
> its parameters: CRM_meta_interval=[20000] CRM_meta_timeout=[15000]
> crm_feature_set=[3.0.1] CRM_meta_name=[start]  for rsc is already
> running.
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: perform_op:2916:
> postponing all ops on resource ldirectord by 1000 ms
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: perform_op:2906:
> operation start[5467] on ocf::ldirectord::ldirectord for client 4845,
> its parameters: CRM_meta_interval=[20000] CRM_meta_timeout=[15000]
> crm_feature_set=[3.0.1] CRM_meta_name=[start]  for rsc is already
> running.
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: perform_op:2910:
> operations on resource ldirectord already delayed
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: Managed
> ldirectord:start process 5221 exited with return code 0.
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: rsc:ldirectord:5468:
> stop
> Oct 28 18:12:27 vmx1.ucar.edu lrmd: [4842]: info: Managed
> ldirectord:stop process 5226 exited with return code 0.
> Oct 28 18:12:28 vmx1.ucar.edu lrmd: [4842]: WARN: Managed
> ldirectord:monitor process 5265 exited with return code 7.
> Oct 28 18:12:29 vmx1.ucar.edu lrmd: [4842]: info: cancel_op: operation
> monitor[5469] on ocf::ldirectord::ldirectord for client 4845, its
> parameters: CRM_meta_interval=[20000] CRM_meta_timeout=[20000]
> crm_feature_set=[3.0.1] CRM_meta_name=[monitor]  cancelled
> Oct 28 18:12:29 vmx1.ucar.edu lrmd: [4842]: info: rsc:ldirectord:5470:
> stop
> Oct 28 18:12:29 vmx1.ucar.edu lrmd: [4842]: info: Managed
> ldirectord:stop process 5296 exited with return code 0.
>
> And then it repeats:
>
> Oct 28 18:12:31 vmx1.ucar.edu lrmd: [4842]: info: rsc:ldirectord:5471:
> start
>
> etc.
>
> How can I figure out what I have done wrong here?
>
> Thanks,
> --Greg
>
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>   
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] debugging resource configuration

Reply via email to