Max Hofer wrote:
> I tried some power-off tests and after runing on both cluster
> nodes at the same time they sometimes go havoc.
> 
> I run into 2 problems:
> 1.) on one cluster heartbeat shutdown with 
> ERROR: Cannot write to media pipe 0: Resource temporarily unavailable

This would be an OS-level problem.  Since you didn't supply a
configuration, I can't tell you which device this happened on.  If you
listed serial first, that could easily be the problem.  Serial doesn't
work very usefully in R2 configurations.  I'd take it out.

> 2.) a return code from a resource agent got dropped with error message
> crmd[1299]: 2007/05/04_10:58:22 WARN: msg_to_op(1173): failed to get the 
> value of field lrm_opstatus from a ha_msg
> 
> Pre-Condition:
> * cluster nodes have interconnected via:
> - RS232
> - bond0 via ucast (normal LAN)
> - bond1 (bcast - intra LAN between cluster nodes where DRBD device
> is syncronized)
> * seems the communication using ttyS01 does not work (i have to check
> the cabling)
> 
> attached two log files - and a short resumee what happened:
> * 10:57:02
> - both server are powered up and start up heartbeat at the same time
> - management2 DC
> - management1 went into "primary state", i.e. starts the cluster resouce
> defining a node as pirmary
> * 10:58:04
> - heartbeat on management2 crashes
> ERROR: Cannot write to media pipe 0: Resource temporarily unavailable
> * 10:58:21
> - management1 was elected as new DC
> (strange timeouts: the action numbers which timeout on management2 do
> match the numbers after the DC switch --- is that normal)
> ---> fail-count change
> * 10:58:22
> - failed to get lrm_opstatus from ha_msg ---> rc for db_mgmt resource start 
> is lost
> 
> crmd[1299]: 2007/05/04_10:58:22 WARN: msg_to_op(1173): failed to get the 
> value of field lrm_opstatus from a ha_msg
> crmd[1299]: 2007/05/04_10:58:22 info: msg_to_op: Message follows:
> crmd[1299]: 2007/05/04_10:58:22 info: MSG: Dumping message with 13 fields
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[0] : [lrm_t=op]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[1] : [lrm_rid=db_mgmt]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[2] : [lrm_op=monitor]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[3] : [lrm_timeout=120000]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[4] : [lrm_interval=120000]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[5] : [lrm_delay=60000]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[6] : [lrm_targetrc=-2]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[7] : [lrm_app=crmd]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[8] : 
> [lrm_userdata=81:4:3e4ad4f1-ae5f-4d79-8f5b-db752a9d1121]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[9] : [(2)lrm_param=0x82180e0(199 
> 245)]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG: Dumping message with 8 fields
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[0] : [CRM_meta_interval=120000]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[1] : [CRM_meta_start_delay=60000]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[2] : [startup_timeout=60]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[3] : [CRM_meta_id=db-mgmt-monitor]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[4] : [CRM_meta_timeout=120000]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[5] : [crm_feature_set=1.0.7]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[6] : [pgdb=dmc]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[7] : [CRM_meta_name=monitor]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[10] : [lrm_callid=60]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[11] : [lrm_app=crmd]
> crmd[1299]: 2007/05/04_10:58:22 info: MSG[12] : [lrm_callid=60]
> 
> * 10.58:23 failctoun for db_mgmt is increased ---> which shut down the 
> resource group 
> (that's how i confihgured it)

Before making a bugzilla for this, please supply configuration files and
logs from both machines as part of the bugzilla.

-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to