Hi, On Thu, Apr 07, 2011 at 04:23:17PM +0100, Matthew Richardson wrote: > I'm currently trying to use the stonith device external/ipmi through > pacemaker and I'm having some problems. > > Running the stonith command directly reports success: > > stonith -St external/ipmi -p "pcw3572.see.ed.ac.uk 129.215.215.144 root > password lanplus" > stonith: external/ipmi device OK > > (and I get errors if I put in a wrong password etc, so the connection is > definitely working). > > I use the following cib config: > > primitive stonithipmidisk1 stonith:external/ipmi \ > params hostname="pcw3572.see.ed.ac.uk" ipaddr="129.215.215.144" > userid="root" passwd="password" interface="lanplus" > > and the stonith device starts correctly. > > When I 'kill' the node (killall -9 corosync) the node state switches to > UNCLEAN (offline). However, the stonith device never causes the node to > be powered off > > I see the following in the logs: > > Apr 7 16:00:47 pcw3571 crmd: [8318]: info: ais_dispatch_message: > Membership 2568: quorum retained > Apr 7 16:00:47 pcw3571 cib: [8314]: info: ais_dispatch_message: > Membership 2568: quorum retained > Apr 7 16:00:47 pcw3571 cib: [8314]: info: crm_update_peer: Node > pcw3572.see.ed.ac.uk: id=2379732865 state=lost (new) addr=r(0) > ip(129.215.215.141) r(1) ip(192.168.140.2) votes=1 born=2564 seen=2564 > proc=00000000000000000000000000111312 > Apr 7 16:00:47 pcw3571 crmd: [8318]: info: ais_status_callback: status: > pcw3572.see.ed.ac.uk is now lost (was member) > Apr 7 16:00:47 pcw3571 crmd: [8318]: info: crm_update_peer: Node > pcw3572.see.ed.ac.uk: id=2379732865 state=lost (new) addr=r(0) > ip(129.215.215.141) r(1) ip(192.168.140.2) votes=1 born=2564 seen=2564 > proc=00000000000000000000000000111312 > Apr 7 16:00:47 pcw3571 stonith-ng: [8313]: info: stonith_queryQuery > <stonith_command t="stonith-ng" > st_async_id="b8ed0eda-b271-4e1f-84c6-cd2d99a1e7c9" st_op="st_query" > st_callid="0" st_callopt="0" > st_remote_op="b8ed0eda-b271-4e1f-84c6-cd2d99a1e7c9" > st_target="pcw3572.see.ed.ac.uk" st_device_action="reboot" > st_clientid="d98c3433-910b-48c7-b542-6abab4deb0b1" st_timeout="6000" > src="pcw3574.see.ed.ac.uk" seq="51" /> > Apr 7 16:00:48 pcw3571 stonith-ng: [8313]: info: > can_fence_host_with_device: Refreshing port list for stonithipmidisk1 > Apr 7 16:00:48 pcw3571 stonith-ng: [8313]: WARN: parse_host_line: Could > not parse (0 0): > Apr 7 16:00:48 pcw3571 stonith-ng: [8313]: info: > can_fence_host_with_device: stonithipmidisk1 can not fence > pcw3572.see.ed.ac.uk: dynamic-list > Apr 7 16:00:48 pcw3571 stonith-ng: [8313]: info: stonith_query: Found 0 > matching devices for 'pcw3572.see.ed.ac.uk' > Apr 7 16:00:48 pcw3571 stonith-ng: [8313]: info: stonith_command: > Processed st_query from pcw3574.see.ed.ac.uk: rc=0 > Apr 7 16:00:53 pcw3571 stonith-ng: [8313]: info: stonith_queryQuery > <stonith_command t="stonith-ng" > st_async_id="5432b828-847f-4f7b-9a91-420f90a86e61" st_op="st_query" > st_callid="0" st_callopt="0" > st_remote_op="5432b828-847f-4f7b-9a91-420f90a86e61" > st_target="pcw3572.see.ed.ac.uk" st_device_action="reboot" > st_clientid="d98c3433-910b-48c7-b542-6abab4deb0b1" st_timeout="6000" > src="pcw3574.see.ed.ac.uk" seq="53" /> > Apr 7 16:00:53 pcw3571 stonith-ng: [8313]: info: > can_fence_host_with_device: stonithipmidisk1 can not fence > pcw3572.see.ed.ac.uk: dynamic-list > Apr 7 16:00:53 pcw3571 stonith-ng: [8313]: info: stonith_query: Found 0 > matching devices for 'pcw3572.see.ed.ac.uk' > Apr 7 16:00:53 pcw3571 stonith-ng: [8313]: info: stonith_command: > Processed st_query from pcw3574.see.ed.ac.uk: rc=0 > > > Can anyone give me some pointers on what is going wrong here?
For whatever reason stonith-ng doesn't think that stonithipmidisk1 can manage this node. Which version of Pacemaker do you run? Perhaps this has been fixed in the meantime. I cannot recall right now if there has been such a problem, but it's possible. You can also try to turn debug on and see if there are more clues. Thanks, Dejan > Thanks, > > Matthew > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
