Thanks Dejan for your help.  Comments below

>> get the value of field lrm_opstatus from a ha_msg 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: msg_to_op: Message follows: 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG: Dumping message with 16 
>> fields 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[0] : [lrm_t=op] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[1] : 
>> [lrm_rid=SSJ0000E02A2:0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[2] : [lrm_op=start] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[3] : 
[lrm_timeout=300000] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[4] : [lrm_interval=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[5] : [lrm_delay=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[6] : [lrm_copyparams=1] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[7] : [lrm_t_run=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[8] : [lrm_t_rcchange=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[9] : [lrm_exec_time=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[10] : [lrm_queue_time=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[11] : [lrm_targetrc=-1] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[12] : [lrm_app=crmd] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[13] : 
>> [lrm_userdata=91:3:0:dc9ad1c7-1d74-4418-a002-34426b34b576] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[14] : 
>> [(2)lrm_param=0x64c230(938 1098)] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG: Dumping message with 27 
>> fields 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[0] : [CRM_meta_clone=0] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[1] : 
>> [CRM_meta_notify_slave_resource= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[2] : 
>> [CRM_meta_notify_active_resource= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[3] : 
>> [CRM_meta_notify_demote_uname= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[4] : 
>> [CRM_meta_notify_inactive_resource=SSJ0000E02A2:0 SSJ0000E02A2:1 ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[5] : 
>> [ssconf=/var/omneon/config/config.J0000E02A2] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[6] : 
>> [CRM_meta_master_node_max=1] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[7] : 
>> [CRM_meta_notify_stop_resource= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[8] : 
>> [CRM_meta_notify_master_resource= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[9] : 
>> [CRM_meta_clone_node_max=1] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[10] : 
>> [CRM_meta_clone_max=2] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[11] : 
>> [CRM_meta_notify=true] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[12] : 
>> [CRM_meta_notify_start_resource=SSJ0000E02A2:0 SSJ0000E02A2:1 ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[13] : 
>> [CRM_meta_notify_stop_uname= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[14] : 
>> [crm_feature_set=3.0.1] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[15] : 
>> [CRM_meta_notify_master_uname= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[16] : 
>> [CRM_meta_master_max=1] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[17] : 
>> [CRM_meta_globally_unique=false] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[18] : 
>> [CRM_meta_notify_promote_resource=SSJ0000E02A2:0 ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[19] : 
>> [CRM_meta_notify_promote_uname=mgraid-s0000e02a1-0 ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[20] : 
>> [CRM_meta_notify_active_uname= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[21] : 
>> [CRM_meta_notify_start_uname=mgraid-s0000e02a1-0 mgraid-s0000e02a1-1 ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[22] : 
>> [CRM_meta_notify_slave_uname= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[23] : 
>> [CRM_meta_name=start] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[24] : 
>> [ss_resource=SSJ0000E02A2] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[25] : 
>> [CRM_meta_notify_demote_resource= ] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[26] : 
>> [CRM_meta_timeout=300000] 
>> 2011-03-24 18:53:12| info |crmd: [27913]: info: MSG[15] : [lrm_callid=15] 
>> 
>> This results in the resources being stopped even though I can see from the 
>> logging that the agent START function returned $OCF_SUCCESS. (The agent 
start 
>> function prints "ss_start() START" and "ss_start() END" in the logging). 
>> 
>> The START function can take anywhere from 30 - 60 seconds to complete due to 
>>our 
>> 
>> 
>> 
>> application. 
>> 
>> I am running with 1.0.9 Pacemaker and heartbeat 3.0.3. 
>> 
>> I have attached the configuration as a file to this email since I thought it 
>> would make the email unreadable. (Summary is 6 master/slave resources). 
>> 
>> I have also attached logs . The above messages are from the file 
n0-short.txt 
>> but also occur in n1-short.txt. 
>> 
>> I thought that maybe I was running into a problem with the number of threads 
>> that lrmd had configured. I increased in to 40 and proved that it was in 
>> affect with: 
>> 
>> # /sbin/lrmadmin -g max-children 
>> max-children: 40 
>> 
>> This problem is reproducible every time. 
>
>The missing lrm_opstatus field is due to the operation never 
>being run hence no status to report. Perhaps this particular 
>case should have severity reduced to info. 
>
>Did you observe any adverse effects otherwise? 
>
>Thanks, 
>
>Dejan 

The agent START function was called as seen by the "ss_start() START" and 
"ss_start() END" messages.   The return value $OCF_SUCCESS appears to have been 
lost somehow.

The result is that the resource was stopped and I could not restart it even 
after clearing the failure counts with "crm_resource -C".


Thanks,

Bob


      
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to