Re: [Linux-HA] nfsserver error

David . Livingstone Wed, 09 Jun 2010 16:08:48 -0700

Hi,

See below.


> Message: 2
> Date: Tue, 8 Jun 2010 18:35:29 +0200
> From: Dejan Muhamedagic <[email protected]>
> Subject: Re: [Linux-HA] nfsserver error
> To: General Linux-HA mailing list <[email protected]>
> Message-ID: <[email protected]>
> Content-Type: text/plain; charset=us-ascii

> Hi,

> On Mon, Jun 07, 2010 at 05:15:13PM -0600, [email protected] wrote:
> > Dejan,
> >
> > Thanks for the reply. See below :
> >
> > http://www.linux-ha.org/HaNFS
> >
> > > Message: 6
> > > Date: Mon, 7 Jun 2010 15:16:19 +0200
> > > From: Dejan Muhamedagic <[email protected]>
> > > Subject: Re: [Linux-HA] nfsserver error
> > > To: General Linux-HA mailing list <[email protected]>
> > > Message-ID: <[email protected]>
> > > Content-Type: text/plain; charset=us-ascii
> >
> > > Hi,
> >
> > > On Fri, Jun 04, 2010 at 02:49:20PM -0600, [email protected] 
wrote:
> > > > I have a 2-node drbd/heartbeat cluster running on RHEL5.4(proliant
> > > > dl380G6) I am attempting
> > > > to add ha nfs(nfsserver)to and have run into an error. 
Specifically
> > after
> > > > adding the nfsserver resource
> > > > and then running "crm resource cleanup nfsserver_nlsmtl" I have 
the
> > > > resource running however I also have
> > > > the below failed action :
> > > >
> > > > [r...@nlsmtl6 ~]# crm status
> > > > ============
> > > > Last updated: Fri Jun  4 14:16:14 2010
> > > > Stack: Heartbeat
> > > > Current DC: nlsmtl6 (16fd6af0-429e-402d-a5d8-a00a818f139a) - 
partition
> > > > with quorum
> > > > Version: 1.0.8-3225fc0d98c8fcd0f7b24f0134e89967136a9b00
> > > > 2 Nodes configured, unknown expected votes
> > > > 3 Resources configured.
> > > > ============
> > > >
> > > > Online: [ nlsmtl5 nlsmtl6 ]
> > > >
> > > > Resource Group: grp_1
> > > > drbddisk_2 (heartbeat:drbddisk):   Started nlsmtl5
> > > > Filesystem_3       (ocf::heartbeat:Filesystem):    Started nlsmtl5
> > > > rc.primary_5       (lsb:rc.primary):       Started nlsmtl5
> > > > IPaddr_nlsmtl      (ocf::heartbeat:IPaddr):        Started nlsmtl5
> > > > nfsserver_nlsmtl   (ocf::heartbeat:nfsserver):     Started nlsmtl5
> > > > CL_stonithset_node01   (stonith:external/riloe-iders): Started 
nlsmtl6
> > > > CL_stonithset_node02   (stonith:external/riloe-iders): Started 
nlsmtl5
> > > >
> > > > Failed actions:
> > > > nfsserver_nlsmtl_monitor_0 (node=nlsmtl6, call=12, rc=2,
> > status=complete):
> > > > invalid parameter
> >
> > > The only places where this error can occur is if nfs_ip or
> > > nfs_shared_infodir are not set. Your configuration below looks
> > > fine. Did you check the logs? There should be more log messages
> > > from nfsserver.
> >
> > > Thanks,
> >
> > > Dejan
> >
> > To verify I zeroed my config and started from scratch. after loading 
my
> > resources I get :
> >
> > [r...@nlsmtl5 config_nlsmtl]# crm_mon -1
> > ============
> > Last updated: Mon Jun  7 14:44:30 2010
> > Stack: Heartbeat
> > Current DC: nlsmtl6 (6fa3ad00-4761-4e52-842c-36f002971200) - partition
> > with quorum
> > Version: 1.0.8-3225fc0d98c8fcd0f7b24f0134e89967136a9b00
> > 2 Nodes configured, unknown expected votes
> > 3 Resources configured.
> > ============
> >
> > Online: [ nlsmtl5 nlsmtl6 ]
> >
> > Resource Group: grp_1
> > drbddisk_2 (heartbeat:drbddisk):   Started nlsmtl5
> > Filesystem_3       (ocf::heartbeat:Filesystem):    Started nlsmtl5
> > IPaddr_nlsmtl      (ocf::heartbeat:IPaddr):        Started nlsmtl5
> > rc.primary_5       (lsb:rc.primary):       Started nlsmtl5
> > nfsserver_nlsmtl   (ocf::heartbeat:nfsserver):     Stopped
> > CL_stonithset_node01   (stonith:external/riloe-iders): Started nlsmtl6
> > CL_stonithset_node02   (stonith:external/riloe-iders): Started nlsmtl5
> >
> > Failed actions:
> > nfsserver_nlsmtl_monitor_0 (node=nlsmtl5, call=6, rc=2, 
status=complete):
> > invalid parameter
> > nfsserver_nlsmtl_monitor_0 (node=nlsmtl6, call=6, rc=2, 
status=complete):
> > invalid parameter
> > [r...@nlsmtl5 config_nlsmtl]#
> >
> > Why do I have monitoring fail on both nodes - why is it running on 
nlsmtl6
> > at all ?

> Don't see it running. But you still didn't get the right logs.
> The problem is with a resource, so you need to lookup messages
> either by lrmd or nfsserver. If you can't, then make hb_report
> and attach that or post it somewhere.

> Thanks,

> Dejan

Ok finally figured this out after a bunch of debug code ...

On startup nfsserver monitor is invoked on BOTH nodes when all
resources are STOPPED. If this just invoked a "/etc/init.d/nfs status"
(rc=3) then all would be okay however the nfsserver resource has the
following code :

nfsserver_validate ()
{
check_binary ${OCF_RESKEY_nfs_init_script}
check_binary ${OCF_RESKEY_nfs_notify_cmd}

if [ -z ${OCF_RESKEY_nfs_ip} ]; then
exit $OCF_ERR_ARGS
fi

if [ -d $OCF_RESKEY_nfs_shared_infodir ]; then
return $OCF_SUCCESS
else
exit $OCF_ERR_ARGS
fi
}

if [ -n "$OCF_RESKEY_CRM_meta_clone" ]; then
ocf_log err "THIS RA DO NOT SUPPORT CLONE MODE!"
exit $OCF_ERR_CONFIGURED
fi

nfsserver_validate

case $__OCF_ACTION in
start)      nfsserver_start
;;
stop)       nfsserver_stop
;;
monitor)    nfsserver_monitor
;;
validate-all)   nfsserver_validate
;;
*)      nfsserver_usage
exit $OCF_ERR_UNIMPLEMENTED
;;
esac

The nfsserver_validate call checks the existence of 
$OCF_RESKEY_nfs_shared_infodir which
fails because the drbd resource has not yet been started. This results 
with the crm_mon
output above. Commenting out nfsserver_validate allows the resource to 
start.

I can send further logs if needed.

Still have a question of which services should be turned off(RHEL5.4) 
before nfsserver
is invoked. I presume nfs and nfslock, what about rpcgssd, rpcidmapd ?

Thanks

> > Here is some of the log file from the DC :
> >
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: te_rsc_command: 
Initiating
> > action 10: monitor CL_stonithset_node02_monitor_0 on nlsmtl5
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: te_rsc_command: 
Initiating
> > action 18: monitor CL_stonithset_node02_monitor_0 on nlsmtl6 (local)
> > Jun  7 13:08:10 nlsmtl6 logger: We are not PRIMARY ...
> > Jun  7 13:08:10 nlsmtl6 lrmd: [17964]: notice: lrmd_rsc_new(): No
> > lrm_rprovider field in message
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: do_lrm_rsc_op: Performing
> > key=18:4:7:49711e59-c5cf-45a5-abb2-b0fe09acfce9
> > op=CL_stonithset_node02_monitor_0 )
> > Jun  7 13:08:10 nlsmtl6 lrmd: [17964]: info: 
rsc:CL_stonithset_node02:8:
> > monitor
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation drbddisk_2_monitor_0 (call=2, rc=7, cib-update=58,
> > confirmed=true) not running
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: 
Result:
> > stopped (Secondary)
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation rc.primary_5_monitor_0 (call=5, rc=7, cib-update=59,
> > confirmed=true) not running
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation CL_stonithset_node02_monitor_0 (call=8, rc=7, cib-update=60,
> > confirmed=true) not running
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > drbddisk_2_monitor_0 (12) confirmed on nlsmtl6 (rc=0)
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > rc.primary_5_monitor_0 (15) confirmed on nlsmtl6 (rc=0)
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > CL_stonithset_node02_monitor_0 (18) confirmed on nlsmtl6 (rc=0)
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation nfsserver_nlsmtl_monitor_0 (call=6, rc=2, cib-update=61,
> > confirmed=true) invalid parameter
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: WARN: status_from_rc: Action 16
> > (nfsserver_nlsmtl_monitor_0) on nlsmtl6 failed (target: 7 vs. rc: 2):
> > Error
> > Jun  7 13:08:10 nlsmtl6 attrd: [17966]: info: attrd_ha_callback: flush
> > message from nlsmtl6
> > Jun  7 13:08:10 nlsmtl6 attrd: [17966]: info: attrd_ha_callback: flush
> > message from nlsmtl6
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: abort_transition_graph:
> > match_graph_event:272 - Triggered transition abort (complete=0,
> > tag=lrm_rsc_op, id=nfsserver_nlsmtl_monitor_0,
> > magic=0:2;16:4:7:49711e59-c5cf-45a5-abb2-b0fe09acfce9, cib=0.8.5) : 
Event
> > failed
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: update_abort_priority: 
Abort
> > priority upgraded from 0 to 1
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: update_abort_priority: 
Abort
> > action done superceeded by restart
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > nfsserver_nlsmtl_monitor_0 (16) confirmed on nlsmtl6 (rc=4)
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation IPaddr_nlsmtl_monitor_0 (call=4, rc=7, cib-update=62,
> > confirmed=true) not running
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation Filesystem_3_monitor_0 (call=3, rc=7, cib-update=63,
> > confirmed=true) not running
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > IPaddr_nlsmtl_monitor_0 (14) confirmed on nlsmtl6 (rc=0)
> > Jun  7 13:08:10 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > Filesystem_3_monitor_0 (13) confirmed on nlsmtl6 (rc=0)
> > Jun  7 13:08:11 nlsmtl6 lrmd: [17964]: info: 
rsc:CL_stonithset_node01:7:
> > monitor
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: process_lrm_event: LRM
> > operation CL_stonithset_node01_monitor_0 (call=7, rc=7, cib-update=64,
> > confirmed=true) not running
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > CL_stonithset_node01_monitor_0 (17) confirmed on nlsmtl6 (rc=0)
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: te_rsc_command: 
Initiating
> > action 11: probe_complete probe_complete on nlsmtl6 (local) - no 
waiting
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > rc.primary_5_monitor_0 (7) confirmed on nlsmtl5 (rc=0)
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > drbddisk_2_monitor_0 (4) confirmed on nlsmtl5 (rc=0)
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > IPaddr_nlsmtl_monitor_0 (6) confirmed on nlsmtl5 (rc=0)
> > Jun  7 13:08:11 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > Filesystem_3_monitor_0 (5) confirmed on nlsmtl5 (rc=0)
> > Jun  7 13:08:12 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > CL_stonithset_node02_monitor_0 (10) confirmed on nlsmtl5 (rc=0)
> > Jun  7 13:08:12 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > CL_stonithset_node01_monitor_0 (9) confirmed on nlsmtl5 (rc=0)
> > Jun  7 13:08:12 nlsmtl6 crmd: [17967]: WARN: status_from_rc: Action 8
> > (nfsserver_nlsmtl_monitor_0) on nlsmtl5 failed (target: 7 vs. rc: 2):
> > Error
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: abort_transition_graph:
> > match_graph_event:272 - Triggered transition abort (complete=0,
> > tag=lrm_rsc_op, id=nfsserver_nlsmtl_monitor_0,
> > magic=0:2;8:4:7:49711e59-c5cf-45a5-abb2-b0fe09acfce9, cib=0.8.15) : 
Event
> > failed
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: match_graph_event: Action
> > nfsserver_nlsmtl_monitor_0 (8) confirmed on nlsmtl5 (rc=4)
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: te_rsc_command: 
Initiating
> > action 3: probe_complete probe_complete on nlsmtl5 - no waiting
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: run_graph:
> > ====================================================
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: notice: run_graph: Transition 4
> > (Complete=16, Pending=0, Fired=0, Skipped=25, Incomplete=0,
> > Source=/var/lib/pengine/pe-input-4724.bz2): Stopped
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: te_graph_trigger: 
Transition
> > 4 is now complete
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: do_state_transition: 
State
> > transition S_TRANSITION_ENGINE -> S_POLICY_ENGINE [ input=I_PE_CALC
> > cause=C_FSA_INTERNAL origin=notify_crmd ]
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: do_state_transition: All 
2
> > cluster nodes are eligible to run resources.
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: do_pe_invoke: Query 65:
> > Requesting the current CIB: S_POLICY_ENGINE
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: do_pe_invoke_callback:
> > Invoking the PE: query=65, ref=pe_calc-dc-1275937693-40, seq=2, 
quorate=1
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: unpack_config: On 
loss
> > of CCM Quorum: Ignore
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: unpack_config: Node
> > scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = 0
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: 
determine_online_status:
> > Node nlsmtl5 is online
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: ERROR: unpack_rsc_op: Hard 
error
> > - nfsserver_nlsmtl_monitor_0 failed with rc=2: Preventing 
nfsserver_nlsmtl
> > from re-starting on nlsmtl5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: 
determine_online_status:
> > Node nlsmtl6 is online
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: ERROR: unpack_rsc_op: Hard 
error
> > - nfsserver_nlsmtl_monitor_0 failed with rc=2: Preventing 
nfsserver_nlsmtl
> > from re-starting on nlsmtl6
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: group_print: 
Resource
> > Group: grp_1
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print: 
drbddisk_2
> > (heartbeat:drbddisk):   Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print:
> > Filesystem_3    (ocf::heartbeat:Filesystem):    Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print:
> > IPaddr_nlsmtl   (ocf::heartbeat:IPaddr):        Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print:
> > rc.primary_5    (lsb:rc.primary):       Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print:
> > nfsserver_nlsmtl        (ocf::heartbeat:nfsserver):     Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print:
> > CL_stonithset_node01    (stonith:external/riloe-iders): Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: native_print:
> > CL_stonithset_node02    (stonith:external/riloe-iders): Stopped
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > drbddisk_2: Rolling back scores from Filesystem_3
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > drbddisk_2: Rolling back scores from IPaddr_nlsmtl
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > drbddisk_2: Rolling back scores from rc.primary_5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > drbddisk_2: Rolling back scores from nfsserver_nlsmtl
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > Filesystem_3: Rolling back scores from IPaddr_nlsmtl
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > Filesystem_3: Rolling back scores from rc.primary_5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > Filesystem_3: Rolling back scores from nfsserver_nlsmtl
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > IPaddr_nlsmtl: Rolling back scores from rc.primary_5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > IPaddr_nlsmtl: Rolling back scores from nfsserver_nlsmtl
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_merge_weights:
> > rc.primary_5: Rolling back scores from nfsserver_nlsmtl
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: info: native_color: Resource
> > nfsserver_nlsmtl cannot run anywhere
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: RecurringOp:  Start
> > recurring monitor (120s) for drbddisk_2 on nlsmtl5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: RecurringOp:  Start
> > recurring monitor (120s) for Filesystem_3 on nlsmtl5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: RecurringOp:  Start
> > recurring monitor (5s) for IPaddr_nlsmtl on nlsmtl5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: RecurringOp:  Start
> > recurring monitor (120s) for rc.primary_5 on nlsmtl5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: RecurringOp:  Start
> > recurring monitor (30s) for CL_stonithset_node01 on nlsmtl6
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: RecurringOp:  Start
> > recurring monitor (30s) for CL_stonithset_node02 on nlsmtl5
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Start
> > drbddisk_2      (nlsmtl5)
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Start
> > Filesystem_3    (nlsmtl5)
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Start
> > IPaddr_nlsmtl   (nlsmtl5)
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Start
> > rc.primary_5    (nlsmtl5)
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Leave
> > resource nfsserver_nlsmtl       (Stopped)
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Start
> > CL_stonithset_node01    (nlsmtl6)
> > Jun  7 13:08:13 nlsmtl6 pengine: [19553]: notice: LogActions: Start
> > CL_stonithset_node02    (nlsmtl5)
> > Jun  7 13:08:13 nlsmtl6 crmd: [17967]: info: do_state_transition: 
State
> > transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> > cause=C_IPC_MESSAGE origin=handle_response
> >
> > At this point if I try :
> > crm(live)resource# cleanup nfsserver_nlsmtl
> > Cleaning up nfsserver_nlsmtl on nlsmtl5
> >   Cleaning up nfsserver_nlsmtl on nlsmtl6
> >   crm(live)resource#
> >
> > - we get :
> >
> > [r...@nlsmtl5 config_nlsmtl]# crm_mon -1
> > ============
> > Last updated: Mon Jun  7 15:20:40 2010
> > Stack: Heartbeat
> > Current DC: nlsmtl6 (6fa3ad00-4761-4e52-842c-36f002971200) - partition
> > with quorum
> > Version: 1.0.8-3225fc0d98c8fcd0f7b24f0134e89967136a9b00
> > 2 Nodes configured, unknown expected votes
> > 3 Resources configured.
> > ============
> >
> > Online: [ nlsmtl5 nlsmtl6 ]
> >
> > Resource Group: grp_1
> > drbddisk_2 (heartbeat:drbddisk):   Started nlsmtl5
> > Filesystem_3       (ocf::heartbeat:Filesystem):    Started nlsmtl5
> > IPaddr_nlsmtl      (ocf::heartbeat:IPaddr):        Started nlsmtl5
> > rc.primary_5       (lsb:rc.primary):       Started nlsmtl5
> > nfsserver_nlsmtl   (ocf::heartbeat:nfsserver):     Started nlsmtl5
> > CL_stonithset_node01   (stonith:external/riloe-iders): Started nlsmtl6
> > CL_stonithset_node02   (stonith:external/riloe-iders): Started nlsmtl5
> >
> > Failed actions:
> > nfsserver_nlsmtl_monitor_0 (node=nlsmtl6, call=11, rc=2, 
status=complete):
> > invalid parameter
> > [r...@nlsmtl5 config_nlsmtl]#
> >
> > The resource is now running however because it has failed on nlsmtl6 I
> > cannot failover to
> > nlsmtl6. How to I cleat the failed action ?
> >
> > Regardless I'm still not sure wht my correct setup is before I attempt 
to
> > add nfsserver.
> > Which services should be running on nlsmtl5/6. nfslock ? rpcidmapd ? I
> > found the below
> > url however as shown this is not supported.
> >
> > http://www.linux-ha.org/HaNFS
> >
> > Thanks
> > > > [r...@nlsmtl6 ~]#
> > > >
> > > > At this point I cannot migrate to nlsmtl6.  I have not been able 
to
> > find
> > > > much
> > > > documentation on nfsserver. I've found
> > > > http://linux-ha.org/doc/re-ra-nfsserver.html
> > > > but nothing else on setup.
> > > >
> > > > My config is :
> > > >
> > > > r...@nlsmtl5 init.d]# crm configure show
> > > > node $id="16fd6af0-429e-402d-a5d8-a00a818f139a" nlsmtl6 \
> > > > attributes standby="off"
> > > > node $id="2f6b429e-74c3-482e-bf20-5a6b0c94cd46" nlsmtl5 \
> > > > attributes standby="off"
> > > > primitive CL_stonithset_node01 stonith:external/riloe-iders \
> > > > op monitor interval="30s" timeout="20s" on-fail="ignore" \
> > > > op start interval="0" timeout="60s" on-fail="restart" \
> > > > params hostlist="nlsmtl5" ilo_hostname="nlsmtl5-ilo"
> > ilo_user="Heartbeat"
> > > > ilo_password="xxx" ilo_can_reset="0" ilo_protocol="2.0"
> > > > ilo_powerdown_method="button"
> > > > primitive CL_stonithset_node02 stonith:external/riloe-iders \
> > > > op monitor interval="30s" timeout="20s" on-fail="ignore" \
> > > > op start interval="0" timeout="60s" on-fail="restart" \
> > > > params hostlist="nlsmtl6" ilo_hostname="nlsmtl6-ilo"
> > ilo_user="Heartbeat"
> > > > ilo_password="xxx" ilo_can_reset="0" ilo_protocol="2.0"
> > > > ilo_powerdown_method="button"
> > > > primitive Filesystem_3 ocf:heartbeat:Filesystem \
> > > > op monitor interval="120s" timeout="60s" \
> > > > params device="/dev/drbd0" directory="/drbd" fstype="ext3"
> > > > options="defaults"
> > > > primitive IPaddr_nlsmtl ocf:heartbeat:IPaddr \
> > > > op monitor interval="5s" timeout="5s" \
> > > > params ip="165.115.204.222"
> > > > primitive drbddisk_2 heartbeat:drbddisk \
> > > > op monitor interval="120s" timeout="60s" \
> > > > params 1="r0"
> > > > primitive nfsserver_nlsmtl ocf:heartbeat:nfsserver \
> > > > op monitor interval="30s" timeout="60s" \
> > > > params nfs_init_script="/etc/init.d/nfs"
> > nfs_notify_cmd="/sbin/rpc.statd"
> > > > nfs_shared_infodir="/drbd/nfs" nfs_ip="165.115.204.222"
> >
> > > > primitive rc.primary_5 lsb:rc.primary \
> > > > op monitor interval="120s" timeout="60s"
> > > > group grp_1 drbddisk_2 Filesystem_3 rc.primary_5 IPaddr_nlsmtl
> > > > nfsserver_nlsmtl
> > > > location node-1-dont-run CL_stonithset_node01 -inf: nlsmtl5
> > > > location node-2-dont-run CL_stonithset_node02 -inf: nlsmtl6
> > > > location rsc_location_group_1 grp_1 100: nlsmtl5
> > > > property $id="cib-bootstrap-options" \
> > > > dc-version="1.0.8-3225fc0d98c8fcd0f7b24f0134e89967136a9b00" \
> > > > cluster-infrastructure="Heartbeat" \
> > > > no-quorum-policy="ignore" \
> > > > last-lrm-refresh="1275682003"
> > > >
> > > > My packages are :
> > > > drbd-pacemaker-8.3.7-1
> > > > heartbeat-3.0.2-2.el5
> > > > pacemaker-1.0.8-2.el5
> > > > pacemaker-libs-1.0.8-2.el5
> > > > cluster-glue-1.0.3-1.el5
> > > > cluster-glue-libs-1.0.3-1.el5
> > > > corosynclib-1.2.0-1.el5
> > > > corosync-1.2.0-1.el5
> > > >
> > > > I looked at  /usr/lib/ocf/resource.d/heartbeat/nfsserver and
> > > > nfsserver_monitor which is pretty simple.
> > > > nfsserver_monitor ()
> > > > {
> > > > fn=`/bin/mktemp`
> > > > ${OCF_RESKEY_nfs_init_script} status > $fn 2>&1
> > > > rc=$?
> > > > ocf_log debug `cat $fn`
> > > > rm -f $fn
> > > >
> > > > #Adapte LSB status code to OCF return code
> > > > if [ $rc -eq 0 ]; then
> > > > return $OCF_SUCCESS
> > > > elif [ $rc -eq 3 ]; then
> > > > return $OCF_NOT_RUNNING
> > > > else
> > > > return $OCF_ERR_GENERIC
> > > > fi
> > > > }
> > > >
> > > > In my case I presume it would call  "/etc/init.d/nfs status" which 
on
> > :
> > > > nlsmtl5 returns :
> > > > [r...@nlsmtl5 init.d]# /etc/init.d/nfs status
> > > > rpc.mountd (pid 24994) is running...
> > > > nfsd (pid 24991 24990 24989 24988 24987 24980 24979 24969) is
> > running...
> > > > rpc.rquotad (pid 24963) is running...
> > > > [r...@nlsmtl5 init.d]#
> > > > - return code is 0
> > > >
> > > > nlsmtl6 returns :
> > > > [r...@nlsmtl6 ~]# /etc/init.d/nfs status
> > > > rpc.mountd is stopped
> > > > nfsd is stopped
> > > > rpc.rquotad is stopped
> > > > [r...@nlsmtl6 ~]#
> > > > - return code is 3
> > > >
> > > > Why am I getting a rc=2 and how can I debug ? Am I missing 
something
> > on
> > > > setup ? Is
> > > > this the best way to run nfs ?
> > > >
> > > > Thanks
> > > >
> > > >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] nfsserver error

Reply via email to