On 10/07/2013, at 3:42 PM, Vladislav Bogdanov <[email protected]> wrote:
> 10.07.2013 08:38, Andrew Beekhof wrote: >> >> On 10/07/2013, at 3:37 PM, Vladislav Bogdanov <[email protected]> wrote: >> >>> 10.07.2013 08:13, Andrew Beekhof wrote: >>>> >>>> On 10/07/2013, at 2:15 PM, Vladislav Bogdanov <[email protected]> wrote: >>>> >>>>> 10.07.2013 07:05, Andrew Beekhof wrote: >>>>>> >>>>>> On 10/07/2013, at 2:04 PM, Vladislav Bogdanov <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> 10.07.2013 03:39, Andrew Beekhof wrote: >>>>>>>> >>>>>>>> On 10/07/2013, at 1:51 AM, Vladislav Bogdanov <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> 03.07.2013 19:31, Dejan Muhamedagic wrote: >>>>>>>>>> On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote: >>>>>>>>>>> 01.07.2013 18:29, Dejan Muhamedagic wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I'm trying to look if it is now safe to delete non-running nodes >>>>>>>>>>>>> (corosync 2.3, pacemaker HEAD, crmsh tip). >>>>>>>>>>>>> >>>>>>>>>>>>> # crm node delete v02-d >>>>>>>>>>>>> WARNING: 2: crm_node bad format: 7 v02-c >>>>>>>>>>>>> WARNING: 2: crm_node bad format: 8 v02-d >>>>>>>>>>>>> WARNING: 2: crm_node bad format: 5 v02-a >>>>>>>>>>>>> WARNING: 2: crm_node bad format: 6 v02-b >>>>>>>>>>>>> INFO: 2: node v02-d not found by crm_node >>>>>>>>>>>>> INFO: 2: node v02-d deleted >>>>>>>>>>>>> # >>>>>>>>>>>>> >>>>>>>>>>>>> So, I expect that crmsh still doesn't follow latest changes to >>>>>>>>>>>>> 'crm_node >>>>>>>>>>>>> -l'. Although node seems to be deleted correctly. >>>>>>>>>>>>> >>>>>>>>>>>>> For reference, output of crm_node -l is: >>>>>>>>>>>>> 7 v02-c >>>>>>>>>>>>> 8 v02-d >>>>>>>>>>>>> 5 v02-a >>>>>>>>>>>>> 6 v02-b >>>>>>>>>>>> >>>>>>>>>>>> This time the node state was empty. Or it's missing altogether. >>>>>>>>>>>> I'm not sure how's that supposed to be interpreted. We test the >>>>>>>>>>>> output of crm_node -l just to make sure that the node is not >>>>>>>>>>>> online. Perhaps we need to use some other command. >>>>>>>>>>> >>>>>>>>>>> Likely it shows everything from a corosync nodelist. >>>>>>>>>>> After I deleted the node from everywhere except corosync, list is >>>>>>>>>>> still >>>>>>>>>>> the same. >>>>>>>>>> >>>>>>>>>> OK. This patch changes the interface to crm_node to use the >>>>>>>>>> "list partition" option (-p). Could you please test it? >>>>>>>>> >>>>>>>>> Nope. Not enough. Even worse than before. I tested todays tip as it >>>>>>>>> includes that patch with merge of Andrew's public and private master >>>>>>>>> heads. >>>>>>>>> ========= >>>>>>>>> [root@v02-b ~]# crm node show >>>>>>>>> v02-a(5): normal >>>>>>>>> standby: off >>>>>>>>> virtualization: true >>>>>>>>> $id: nodes-5 >>>>>>>>> v02-b(6): normal >>>>>>>>> standby: off >>>>>>>>> virtualization: true >>>>>>>>> v02-c(7): normal >>>>>>>>> standby: off >>>>>>>>> virtualization: true >>>>>>>>> v02-d(8): normal(offline) >>>>>>>>> standby: off >>>>>>>>> virtualization: true >>>>>>>>> [root@v02-b ~]# crm node delete v02-d >>>>>>>>> ERROR: according to crm_node, node v02-d is still active >>>>>>>>> [root@v02-b ~]# crm_node -p >>>>>>>>> v02-c v02-d v02-a v02-b >>>>>>>>> [root@v02-b ~]# crm_node -l >>>>>>>>> 7 v02-c >>>>>>>>> 8 v02-d >>>>>>>>> 5 v02-a >>>>>>>>> 6 v02-b >>>>>>>>> [root@v02-b ~]# >>>>>>>>> ========= >>>>>>>>> >>>>>>>>> That is after I stopped node, lowered votequorum expected_votes (with >>>>>>>>> corosync-quorumtool) and deleted v02-d from a cmap nodelist. >>>>>>>>> >>>>>>>>> corosync-cmapctl still shows runtime info about deleted node as well: >>>>>>>>> runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0 >>>>>>>>> runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55) >>>>>>>>> runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1 >>>>>>>>> runtime.totem.pg.mrp.srp.members.8.status (str) = left >>>>>>>>> And it is not allowed to delete that keys. >>>>>>>>> >>>>>>>>> crm_node -R did the job (nothing left in the CIB), but, v02-d still >>>>>>>>> appears in its output for both -p and -l. >>>>>>>>> >>>>>>>>> Andrew, I copy you directly because above is probably to you. >>>>>>>>> Shouldn't >>>>>>>>> crm_node some-how show that stopped node is deleted from a corosync >>>>>>>>> nodelist? >>>>>>>> >>>>>>>> Which stack is this? >>>>>>> >>>>>>> corosync 2.3 with nodelist and udpu. >>>>>> >>>>>> I assume its possible, but crm_node isn't smart enough to do that yet. >>>>>> Feel like writing a patch? :) >>>>> >>>>> Shouldn't it just skip offline nodes for -p? >>>>> >>>> >>>> Worse. It appears to be asking pacemakerd instead of corosync or crmd. >>>> >>> >>> Hm. I do not believe I'm able to refactor it then... >>> >> >> Yeah, I'm looking at it. >> The hard part is that going to corosync directly only gives you a nodeid :-( >> > > Don't you need to get info from both sources anyway ("offline in crmd > and joined in corosync" case - node has corosync started, but pacemaker > is not)? > Not for -p _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
