Thanks Dejan for your help. Comments below
>> get the value of field lrm_opstatus from a ha_msg
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: msg_to_op: Message follows:
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG: Dumping message with 16
>> fields
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[0] : [lrm_t=op]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[1] :
>> [lrm_rid=SSJ0000E02A2:0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[2] : [lrm_op=start]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[3] :
[lrm_timeout=300000]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[4] : [lrm_interval=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[5] : [lrm_delay=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[6] : [lrm_copyparams=1]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[7] : [lrm_t_run=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[8] : [lrm_t_rcchange=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[9] : [lrm_exec_time=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[10] : [lrm_queue_time=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[11] : [lrm_targetrc=-1]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[12] : [lrm_app=crmd]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[13] :
>> [lrm_userdata=91:3:0:dc9ad1c7-1d74-4418-a002-34426b34b576]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[14] :
>> [(2)lrm_param=0x64c230(938 1098)]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG: Dumping message with 27
>> fields
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[0] : [CRM_meta_clone=0]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[1] :
>> [CRM_meta_notify_slave_resource= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[2] :
>> [CRM_meta_notify_active_resource= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[3] :
>> [CRM_meta_notify_demote_uname= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[4] :
>> [CRM_meta_notify_inactive_resource=SSJ0000E02A2:0 SSJ0000E02A2:1 ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[5] :
>> [ssconf=/var/omneon/config/config.J0000E02A2]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[6] :
>> [CRM_meta_master_node_max=1]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[7] :
>> [CRM_meta_notify_stop_resource= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[8] :
>> [CRM_meta_notify_master_resource= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[9] :
>> [CRM_meta_clone_node_max=1]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[10] :
>> [CRM_meta_clone_max=2]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[11] :
>> [CRM_meta_notify=true]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[12] :
>> [CRM_meta_notify_start_resource=SSJ0000E02A2:0 SSJ0000E02A2:1 ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[13] :
>> [CRM_meta_notify_stop_uname= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[14] :
>> [crm_feature_set=3.0.1]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[15] :
>> [CRM_meta_notify_master_uname= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[16] :
>> [CRM_meta_master_max=1]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[17] :
>> [CRM_meta_globally_unique=false]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[18] :
>> [CRM_meta_notify_promote_resource=SSJ0000E02A2:0 ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[19] :
>> [CRM_meta_notify_promote_uname=mgraid-s0000e02a1-0 ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[20] :
>> [CRM_meta_notify_active_uname= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[21] :
>> [CRM_meta_notify_start_uname=mgraid-s0000e02a1-0 mgraid-s0000e02a1-1 ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[22] :
>> [CRM_meta_notify_slave_uname= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[23] :
>> [CRM_meta_name=start]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[24] :
>> [ss_resource=SSJ0000E02A2]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[25] :
>> [CRM_meta_notify_demote_resource= ]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[26] :
>> [CRM_meta_timeout=300000]
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[15] : [lrm_callid=15]
>>
>> This results in the resources being stopped even though I can see from the
>> logging that the agent START function returned $OCF_SUCCESS. (The agent
start
>> function prints "ss_start() START" and "ss_start() END" in the logging).
>>
>> The START function can take anywhere from 30 - 60 seconds to complete due to
>>our
>>
>>
>>
>> application.
>>
>> I am running with 1.0.9 Pacemaker and heartbeat 3.0.3.
>>
>> I have attached the configuration as a file to this email since I thought it
>> would make the email unreadable. (Summary is 6 master/slave resources).
>>
>> I have also attached logs . The above messages are from the file
n0-short.txt
>> but also occur in n1-short.txt.
>>
>> I thought that maybe I was running into a problem with the number of threads
>> that lrmd had configured. I increased in to 40 and proved that it was in
>> affect with:
>>
>> # /sbin/lrmadmin -g max-children
>> max-children: 40
>>
>> This problem is reproducible every time.
>
>The missing lrm_opstatus field is due to the operation never
>being run hence no status to report. Perhaps this particular
>case should have severity reduced to info.
>
>Did you observe any adverse effects otherwise?
>
>Thanks,
>
>Dejan
The agent START function was called as seen by the "ss_start() START" and
"ss_start() END" messages. The return value $OCF_SUCCESS appears to have been
lost somehow.
The result is that the resource was stopped and I could not restart it even
after clearing the failure counts with "crm_resource -C".
Thanks,
Bob
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems