Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Randy Katz Fri, 20 May 2011 05:57:52 -0700

Hi Lars,

Thank you for the tools to look at things, however, on a whim before 
getting into them as
DRBD was looking fine in that scenario I decided to just run through the 
install on a different
pair of VMs and making sure I used the gitco.de repository when it came 
to drbd83 and the
clusterlabs repo for pacemaker (heartbeat and everything comes with it 
once the libesmtp
requirement is settled, in this case by using a later epel install: rpm 
-ivH epel-release-5-4.noarch.rpm):


So using the exact same configuration in crm except standby is "off" on 
both VMs of course, when
I do the same crm node standby on one the other takes over and then back 
again, no problem. I am
going to go back and either reinstall the other and/or compare each and 
every rpm and source to see
which is broken or just store my install procedure.

Now off to learn what you mentioned about crm resource move, thanks again.

Regards,
Randy


On 5/20/2011 1:03 AM, Lars Ellenberg wrote:
> On Thu, May 19, 2011 at 11:53:24PM -0700, Randy Katz wrote:
>> Lars,
>>
>> Thank you much for the answer on the "standby" issue.
>> It seems that that was the tip of my real issue. So now I have both nodes
>> coming online. And it seems ha1 starts fine with all the resources starting.
>>
>> With them both online if I issue the: crm mode standby ha1.iohost.com
> Why.
>
> Learn about "crm resource move".
> (and unmove, for that matter).
>
>> Then I see IP Takeover on ha2 but the other resources do not start,
>> ever, it remains:
>>
>> Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby
>> Online: [ ha2.iohost.com ]
>>
>>    Resource Group: WebServices
>>        ip1        (ocf::heartbeat:IPaddr2):       Started ha2.iohost.com
>>        ip1arp     (ocf::heartbeat:SendArp):       Started ha2.iohost.com
>>        fs_webfs   (ocf::heartbeat:Filesystem):    Stopped
>>        fs_mysql   (ocf::heartbeat:Filesystem):    Stopped
>>        apache2    (lsb:httpd):    Stopped
>>        mysql      (ocf::heartbeat:mysql): Stopped
>>    Master/Slave Set: ms_drbd_mysql
>>        Slaves: [ ha2.iohost.com ]
>>        Stopped: [ drbd_mysql:0 ]
>>    Master/Slave Set: ms_drbd_webfs
>>        Slaves: [ ha2.iohost.com ]
>>        Stopped: [ drbd_webfs:0 ]
>>
>> In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com
>> pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere
>>
>> I am not sure why it cannot promote the other resources on ha2, I
>> checked drbd before putting ha1 on standby and it was up to date.
> Double check the status of drbd:
> # cat /proc/drbd
>
> Check what the cluster would do, and why:
> # ptest -LVVV -s
> [add more Vs to see more detail, but brace yourself for maximum confusion ;-)]
>
> Check for constraints that get in the way:
> # crm configure show | grep -Ee 'location|order'
>
> check the "master scores" in the cib:
> # cibadmin -Ql -o status | grep master
>
> Look at the actions that have been performed on the resource,
> on both nodes:
>                 vvvvvvvvvv-- the ID of your primitive
> # grep "lrmd:.*drbd_mysql" /var/log/ha.log
> or wherever that ends up on your box
>
>> Here are the surrounding log entries, the only thing I changed in the
>> config is standby="off" on both nodes:
>>
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print:  
>> Resource Group: WebServices
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:      
>> ip1  (ocf::heartbeat:IPaddr2):       Started ha2.iohost.com
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:      
>> ip1arp       (ocf::heartbeat:SendArp):       Started ha2.iohost.com
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:      
>> fs_webfs     (ocf::heartbeat:Filesystem):    Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:      
>> fs_mysql     (ocf::heartbeat:Filesystem):    Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:      
>> apache2      (lsb:httpd):    Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print:      
>> mysql        (ocf::heartbeat:mysql): Stopped
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print:  
>> Master/Slave Set: ms_drbd_mysql
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print:      
>> Slaves: [ ha2.iohost.com ]
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print:      
>> Stopped: [ drbd_mysql:0 ]
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print:  
>> Master/Slave Set: ms_drbd_webfs
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print:      
>> Slaves: [ ha1.iohost.com ha2.iohost.com ]
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
>> ip1arp: Breaking dependency loop at ip1
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
>> ip1: Breaking dependency loop at ip1arp
> You got a dependency loop?
> Maybe you should fix that?
>
> You put some things in a group in a specific order, then you specify the
> reverse order in an explicit order and colocation constraint. That is not
> particularly useful. Either use a group, or use explicit order/colocation
> constraints, don't try to use both for the same resources.
>
> But that's nothing that would affect DRBD at this point.
> And as long as your DRBD is not (or can not?) be promoted,
> nothing that depends on it will run, obviously.
>
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource 
>> drbd_webfs:0 cannot run anywhere
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
>> fs_webfs: Rolling back scores from fs_mysql
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource 
>> fs_webfs cannot run anywhere
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource 
>> drbd_mysql:0 cannot run anywhere
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
>> fs_mysql: Rolling back scores from apache2
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource 
>> fs_mysql cannot run anywhere
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
>> apache2: Rolling back scores from mysql
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource 
>> apache2 cannot run anywhere
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource 
>> mysql cannot run anywhere
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
>> May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
>> ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
>>
>> Regards,
>> Randy

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Reply via email to