Re: [Linux-HA] failed to get the value of field lrm_opstatus from a ha_msg

Max Hofer Fri, 04 May 2007 07:57:04 -0700

On Friday 04 May 2007 16:14, Andrew Beekhof wrote:
> On 5/4/07, Max Hofer <[EMAIL PROTECTED]> wrote:
> > I tried some power-off tests and after runing on both cluster
> > nodes at the same time they sometimes go havoc.
> >
> > I run into 2 problems:
> > 1.) on one cluster heartbeat shutdown with
> > ERROR: Cannot write to media pipe 0: Resource temporarily unavailable
> >
> > 2.) a return code from a resource agent got dropped with error message
> > crmd[1299]: 2007/05/04_10:58:22 WARN: msg_to_op(1173): failed to get the 
> > value of field lrm_opstatus from a ha_msg
> 
> I think opening bugs for these is the best idea.
> 1) should probably be against the "other" component
#1567
> 2) should probably be against the "lrmd" component (since its an lrm
#1568
> library message)
#1569: created another bug entry for action enumeration problem during DC 
failover


> 
> >
> > Pre-Condition:
> > * cluster nodes have interconnected via:
> > - RS232
> > - bond0 via ucast (normal LAN)
> > - bond1 (bcast - intra LAN between cluster nodes where DRBD device
> > is syncronized)
> > * seems the communication using ttyS01 does not work (i have to check
> > the cabling)
> >
> > attached two log files - and a short resumee what happened:
> > * 10:57:02
> > - both server are powered up and start up heartbeat at the same time
> > - management2 DC
> > - management1 went into "primary state", i.e. starts the cluster resouce
> > defining a node as pirmary
> > * 10:58:04
> > - heartbeat on management2 crashes
> > ERROR: Cannot write to media pipe 0: Resource temporarily unavailable
> > * 10:58:21
> > - management1 was elected as new DC
> > (strange timeouts: the action numbers which timeout on management2 do
> > match the numbers after the DC switch --- is that normal)
> > ---> fail-count change
> > * 10:58:22
> > - failed to get lrm_opstatus from ha_msg ---> rc for db_mgmt resource start 
> > is lost
> >
> > crmd[1299]: 2007/05/04_10:58:22 WARN: msg_to_op(1173): failed to get the 
> > value of field lrm_opstatus from a ha_msg
> > crmd[1299]: 2007/05/04_10:58:22 info: msg_to_op: Message follows:
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG: Dumping message with 13 fields
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[0] : [lrm_t=op]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[1] : [lrm_rid=db_mgmt]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[2] : [lrm_op=monitor]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[3] : [lrm_timeout=120000]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[4] : [lrm_interval=120000]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[5] : [lrm_delay=60000]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[6] : [lrm_targetrc=-2]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[7] : [lrm_app=crmd]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[8] : 
> > [lrm_userdata=81:4:3e4ad4f1-ae5f-4d79-8f5b-db752a9d1121]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[9] : [(2)lrm_param=0x82180e0(199 
> > 245)]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG: Dumping message with 8 fields
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[0] : [CRM_meta_interval=120000]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[1] : [CRM_meta_start_delay=60000]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[2] : [startup_timeout=60]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[3] : [CRM_meta_id=db-mgmt-monitor]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[4] : [CRM_meta_timeout=120000]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[5] : [crm_feature_set=1.0.7]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[6] : [pgdb=dmc]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[7] : [CRM_meta_name=monitor]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[10] : [lrm_callid=60]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[11] : [lrm_app=crmd]
> > crmd[1299]: 2007/05/04_10:58:22 info: MSG[12] : [lrm_callid=60]
> >
> > * 10.58:23 failctoun for db_mgmt is increased ---> which shut down the 
> > resource group
> > (that's how i confihgured it)
> >
> > kind regards Max
> >
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 

-- 
Max Hofer
APUS Software G.m.b.H.
A-8074 Raaba, Bahnhofstraße 1/1
T| +43 316 401629 11
F| +43 316 401629 9
W| www.apus.co.at
E| [EMAIL PROTECTED]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] failed to get the value of field lrm_opstatus from a ha_msg

Reply via email to