On 10/07/2013, at 3:42 PM, Vladislav Bogdanov <[email protected]> wrote:

> 10.07.2013 08:38, Andrew Beekhof wrote:
>> 
>> On 10/07/2013, at 3:37 PM, Vladislav Bogdanov <[email protected]> wrote:
>> 
>>> 10.07.2013 08:13, Andrew Beekhof wrote:
>>>> 
>>>> On 10/07/2013, at 2:15 PM, Vladislav Bogdanov <[email protected]> wrote:
>>>> 
>>>>> 10.07.2013 07:05, Andrew Beekhof wrote:
>>>>>> 
>>>>>> On 10/07/2013, at 2:04 PM, Vladislav Bogdanov <[email protected]> 
>>>>>> wrote:
>>>>>> 
>>>>>>> 10.07.2013 03:39, Andrew Beekhof wrote:
>>>>>>>> 
>>>>>>>> On 10/07/2013, at 1:51 AM, Vladislav Bogdanov <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 03.07.2013 19:31, Dejan Muhamedagic wrote:
>>>>>>>>>> On Tue, Jul 02, 2013 at 07:53:52AM +0300, Vladislav Bogdanov wrote:
>>>>>>>>>>> 01.07.2013 18:29, Dejan Muhamedagic wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>> 
>>>>>>>>>>>> On Mon, Jul 01, 2013 at 05:29:31PM +0300, Vladislav Bogdanov wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I'm trying to look if it is now safe to delete non-running nodes
>>>>>>>>>>>>> (corosync 2.3, pacemaker HEAD, crmsh tip).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> # crm node delete v02-d
>>>>>>>>>>>>> WARNING: 2: crm_node bad format: 7 v02-c
>>>>>>>>>>>>> WARNING: 2: crm_node bad format: 8 v02-d
>>>>>>>>>>>>> WARNING: 2: crm_node bad format: 5 v02-a
>>>>>>>>>>>>> WARNING: 2: crm_node bad format: 6 v02-b
>>>>>>>>>>>>> INFO: 2: node v02-d not found by crm_node
>>>>>>>>>>>>> INFO: 2: node v02-d deleted
>>>>>>>>>>>>> #
>>>>>>>>>>>>> 
>>>>>>>>>>>>> So, I expect that crmsh still doesn't follow latest changes to 
>>>>>>>>>>>>> 'crm_node
>>>>>>>>>>>>> -l'. Although node seems to be deleted correctly.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> For reference, output of crm_node -l is:
>>>>>>>>>>>>> 7 v02-c
>>>>>>>>>>>>> 8 v02-d
>>>>>>>>>>>>> 5 v02-a
>>>>>>>>>>>>> 6 v02-b
>>>>>>>>>>>> 
>>>>>>>>>>>> This time the node state was empty. Or it's missing altogether.
>>>>>>>>>>>> I'm not sure how's that supposed to be interpreted. We test the
>>>>>>>>>>>> output of crm_node -l just to make sure that the node is not
>>>>>>>>>>>> online. Perhaps we need to use some other command.
>>>>>>>>>>> 
>>>>>>>>>>> Likely it shows everything from a corosync nodelist.
>>>>>>>>>>> After I deleted the node from everywhere except corosync, list is 
>>>>>>>>>>> still
>>>>>>>>>>> the same.
>>>>>>>>>> 
>>>>>>>>>> OK. This patch changes the interface to crm_node to use the
>>>>>>>>>> "list partition" option (-p). Could you please test it?
>>>>>>>>> 
>>>>>>>>> Nope. Not enough. Even worse than before. I tested todays tip as it
>>>>>>>>> includes that patch with merge of Andrew's public and private master 
>>>>>>>>> heads.
>>>>>>>>> =========
>>>>>>>>> [root@v02-b ~]# crm node show
>>>>>>>>> v02-a(5): normal
>>>>>>>>>    standby: off
>>>>>>>>>    virtualization: true
>>>>>>>>>    $id: nodes-5
>>>>>>>>> v02-b(6): normal
>>>>>>>>>    standby: off
>>>>>>>>>    virtualization: true
>>>>>>>>> v02-c(7): normal
>>>>>>>>>    standby: off
>>>>>>>>>    virtualization: true
>>>>>>>>> v02-d(8): normal(offline)
>>>>>>>>>    standby: off
>>>>>>>>>    virtualization: true
>>>>>>>>> [root@v02-b ~]# crm node delete v02-d
>>>>>>>>> ERROR: according to crm_node, node v02-d is still active
>>>>>>>>> [root@v02-b ~]# crm_node -p
>>>>>>>>> v02-c v02-d v02-a v02-b
>>>>>>>>> [root@v02-b ~]# crm_node -l
>>>>>>>>> 7 v02-c
>>>>>>>>> 8 v02-d
>>>>>>>>> 5 v02-a
>>>>>>>>> 6 v02-b
>>>>>>>>> [root@v02-b ~]#
>>>>>>>>> =========
>>>>>>>>> 
>>>>>>>>> That is after I stopped node, lowered votequorum expected_votes (with
>>>>>>>>> corosync-quorumtool) and deleted v02-d from a cmap nodelist.
>>>>>>>>> 
>>>>>>>>> corosync-cmapctl still shows runtime info about deleted node as well:
>>>>>>>>> runtime.totem.pg.mrp.srp.members.8.config_version (u64) = 0
>>>>>>>>> runtime.totem.pg.mrp.srp.members.8.ip (str) = r(0) ip(10.5.4.55)
>>>>>>>>> runtime.totem.pg.mrp.srp.members.8.join_count (u32) = 1
>>>>>>>>> runtime.totem.pg.mrp.srp.members.8.status (str) = left
>>>>>>>>> And it is not allowed to delete that keys.
>>>>>>>>> 
>>>>>>>>> crm_node -R did the job (nothing left in the CIB), but, v02-d still
>>>>>>>>> appears in its output for both -p and -l.
>>>>>>>>> 
>>>>>>>>> Andrew, I copy you directly because above is probably to you. 
>>>>>>>>> Shouldn't
>>>>>>>>> crm_node some-how show that stopped node is deleted from a corosync
>>>>>>>>> nodelist?
>>>>>>>> 
>>>>>>>> Which stack is this?
>>>>>>> 
>>>>>>> corosync 2.3 with nodelist and udpu.
>>>>>> 
>>>>>> I assume its possible, but crm_node isn't smart enough to do that yet.
>>>>>> Feel like writing a patch? :)
>>>>> 
>>>>> Shouldn't it just skip offline nodes for -p?
>>>>> 
>>>> 
>>>> Worse. It appears to be asking pacemakerd instead of corosync or crmd.
>>>> 
>>> 
>>> Hm. I do not believe I'm able to refactor it then...
>>> 
>> 
>> Yeah, I'm looking at it.
>> The hard part is that going to corosync directly only gives you a nodeid :-(
>> 
> 
> Don't you need to get info from both sources anyway ("offline in crmd
> and joined in corosync" case - node has corosync started, but pacemaker
> is not)?
> 

Not for -p
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to