Re: [Linux-HA] slave's drbd resource doesn't get promote when master dies

Andreas Kurz Thu, 20 Mar 2008 14:00:11 -0700

On Thu, Mar 20, 2008 at 6:43 PM, Jean-Francois Malouin
<[EMAIL PROTECTED]> wrote:
> Hi Dominik,
>
>  * Dominik Klein <[EMAIL PROTECTED]> [20080320 02:23]:
>
> > Jean-Francois Malouin wrote:
>  > >I thought I had it nailed but still no go.
>
>  [...]
>
>  I'm replying late. It's that kind of a day:
>  network failure at home and power failure at work :)
>
>
>  > >
>  > The xml looks good to me.
>
>  glad to know, I'm quite new at this :)
>
>
>  >
>  > >Log show after attempting a crm_standby:
>  > >
>  > >pengine[5003]: 2008/03/19_16:55:58 info: unpack_nodes: Node feeble-1 is in
>  > >standby-mode
>  > >pengine[5003]: 2008/03/19_16:55:58 info: determine_online_status: Node
>  > >feeble-1 is standby
>  > >pengine[5003]: 2008/03/19_16:55:58 info: determine_online_status: Node
>  > >feeble-0 is online
>  > >pengine[5003]: 2008/03/19_16:55:58 WARN: unpack_rsc_op: Processing failed
>  > >op drbd_id:0_promote_0 on feeble-0: Error
>  >
>  > Find out why this failed.
>
>  Can't see why and how by just looking at the debug logs...
>  anyway to increase verbosity in there?


Add 'debug 1' to your ha.cf to increase verbosity. As a side note ...
when using a serial heartbeat channel in combination with crm, use a
baud rate as high as possible e.g. 115200 or higher.

Regards,
Andreas

>
>
>
>  >
>  > >pengine[5003]: 2008/03/19_16:55:58 notice: clone_print: Master/Slave Set:
>  > >ms-drbd_id
>  > >pengine[5003]: 2008/03/19_16:55:58 notice: native_print:     drbd_id:0
>  > >(heartbeat::ocf:drbd):  Master feeble-0 FAILED
>  > >pengine[5003]: 2008/03/19_16:55:58 notice: native_print:     drbd_id:1
>  > >(heartbeat::ocf:drbd):  Stopped pengine[5003]: 2008/03/19_16:55:58 notice:
>  > >native_print: fs_id (heartbeat::ocf:Filesystem):    Stopped pengine[5003]:
>  > >2008/03/19_16:55:58 notice: native_print: ip_id (heartbeat::ocf:IPaddr):
>  > >Stopped pengine[5003]: 2008/03/19_16:55:58 notice: native_print: mysql_id
>  > >(heartbeat::ocf:mysql): Stopped pengine[5003]: 2008/03/19_16:55:58 notice:
>  > >native_print: apache_id (heartbeat::ocf:apache):        Stopped
>  > >pengine[5003]: 2008/03/19_16:55:58 notice: native_print: email_id
>  > >(heartbeat::ocf:MailTo):        Stopped pengine[5003]: 2008/03/19_16:55:58
>  > >WARN: native_color: Resource drbd_id:1 cannot run anywhere
>  >
>  > 2 node cluster, one node in standby, failed start on the other node,
>  > that means the resource cannot run anywhere.
>  >
>  > >cib.xml resources and constraints sections:
>  > >
>  > ><resources>
>  > >  <master_slave id="ms-drbd_id">
>  > >    <meta_attributes id="ma-ms-drbd1_id">
>  > >      <attributes>
>  > >        <nvpair id="ma-ms-drbd-1_id" name="clone_max" value="2"/>
>  > >        <nvpair id="ma-ms-drbd-2_id" name="clone_node_max" value="1"/>
>  > >        <nvpair id="ma-ms-drbd-3_id" name="master_max" value="1"/>
>  > >        <nvpair id="ma-ms-drbd-4_id" name="master_node_max" value="1"/>
>  > >        <nvpair id="ma-ms-drbd-5_id" name="notify" value="yes"/>
>  > >        <nvpair id="ma-ms-drbd-6_id" name="globally_unique" 
> value="false"/>
>  > >        <nvpair id="ma-ms-drbd-7_id" name="target_role" value="started"/>
>  > >      </attributes>
>  > >    </meta_attributes>
>  > >    <primitive id="drbd_id" class="ocf" provider="heartbeat" type="drbd">
>  > >      <operations>
>  > >        <op id="drbd-monitoring" interval="30s" name="monitor"
>  > >        timeout="15s"/>
>  >
>  > You might want to monitor both the slave and the master here.
>  >
>  >           <operations>
>  >             <op id="op1" name="monitor" interval="5s" timeout="5s"
>  > role="Master"/>
>  >             <op id="op2" name="monitor" interval="6s" timeout="5s"
>  > role="Slave"/>
>  >           </operations>
>  >
>  > Make sure you use different intervals, because multiple monitor
>  > operation with the same interval on one resource are not supported.
>
>  [...]
>
>
>  >
>  > >/etc/heartbeat/ha.cf
>  > >
>  > >mcast eth0 239.0.0.1 694 1 0
>  > >bcast eth1
>  > >ping 132.206.178.1
>  > >baud 19200
>  > >serial /dev/ttyS0
>  > >node feeble-0 feeble-1
>  > >auto_failback off
>  > >use_logd on
>  > >respawn hacluster /usr/lib/heartbeat/dopd
>  > >apiauth dopd gid=haclient uid=hacluster
>  > >respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
>  >
>  > Is this complete? Where's "crm on|yes|respawn"?
>
>  got cut while I copy-pasted:
>
>  "crm on" is in ha.cf
>
>  Still no go with the 2 new ops you suggested.
>  Always getting:
>
>  WARN: native_color: Resource drbd_id:0 cannot run anywhere
>
>  thanks you for your time, much appreciated.
>  jf
>
>
>  >
>  > >TIA
>  > >jf
>  >
>  > Regards
>  > Dominik
>  > _______________________________________________
>  > Linux-HA mailing list
>  > [email protected]
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > See also: http://linux-ha.org/ReportingProblems
>
>  --
>  <° ><
>
>
> _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] slave's drbd resource doesn't get promote when master dies

Reply via email to