Re: [ceph-users] Luminous: resilience - private interface down , no read/write

nokia ceph Wed, 23 May 2018 04:21:53 -0700

yes it is 68 disks , and will this  mon_osd_reporter_subtree_level = host
have any impact on  mon_osd_ min_down_reporters ?


And related to min_size , yes there was many suggestions for us to move to
2 , due to storage efficiency concerns we still retain with 1 and trying to
convince customers to go with 2 for better data integrity.

thanks,
Muthu

On Wed, May 23, 2018 at 3:31 PM, David Turner <[email protected]> wrote:

> How many disks in each node? 68? If yes, then change it to 69. Also
> running with ec 4+1 is bad for the same reason as running with size=2
> min_size=1 which has been mentioned and discussed multiple times on the ML.
>
>
> On Wed, May 23, 2018, 3:39 AM nokia ceph <[email protected]> wrote:
>
>> Hi David Turner,
>>
>> This is our ceph config under mon section , we have EC 4+1 and set the
>> failure domain as host and osd_min_down_reporters to 4 ( osds from 4
>> different host ) .
>>
>> [mon]
>> mon_compact_on_start = True
>> mon_osd_down_out_interval = 86400
>> mon_osd_down_out_subtree_limit = host
>> mon_osd_min_down_reporters = 4
>> mon_osd_reporter_subtree_level = host
>>
>> We have 68 disks , can we increase  sd_min_down_reporters  to 68 ?
>>
>> Thanks,
>> Muthu
>>
>> On Tue, May 22, 2018 at 5:46 PM, David Turner <[email protected]>
>> wrote:
>>
>>> What happens when a storage node loses its cluster network but not it's
>>> public network is that all other osss on the cluster see that it's down and
>>> report that to the mons, but the node call still talk to the mons telling
>>> the mons that it is up and in fact everything else is down.
>>>
>>> The setting osd _min_reporters (I think that's the name of it off the
>>> top of my head) is designed to help with this scenario. It's default is 1
>>> which means any osd on either side of the network problem will be trusted
>>> by the mons to mark osds down. What you want to do with this seeing is to
>>> set it to at least 1 more than the number of osds in your failure domain.
>>> If the failure domain is host and each node has 32 osds, then setting it to
>>> 33 will prevent a full problematic node from being able to cause havoc.
>>>
>>> The osds will still try to mark themselves as up and this will still
>>> cause problems for read until the osd process stops or the network comes
>>> back up. There might be a seeing for how long an odd will try telling the
>>> mons it's up, but this isn't really a situation I've come across after
>>> initial testing and installation of nodes.
>>>
>>> On Tue, May 22, 2018, 1:47 AM nokia ceph <[email protected]>
>>> wrote:
>>>
>>>> Hi Ceph users,
>>>>
>>>> We have a cluster with 5 node (67 disks) and EC 4+1 configuration and
>>>> min_size set as 4.
>>>> Ceph version : 12.2.5
>>>> While executing one of our resilience usecase , making private
>>>> interface down on one of the node, till kraken we saw less outage in rados
>>>> (60s) .
>>>>
>>>> Now with luminous, we could able to see rados read/write outage for
>>>> more than 200s . In the logs we could able to see that peer OSDs inform
>>>> that one of the node OSDs are down however the OSDs  defend like it is
>>>> wrongly marked down and does not move to down state for long time.
>>>>
>>>> 2018-05-22 05:37:17.871049 7f6ac71e6700  0 log_channel(cluster) log
>>>> [WRN] : Monitor daemon marked osd.1 down, but it is still running
>>>> 2018-05-22 05:37:17.871072 7f6ac71e6700  0 log_channel(cluster) log
>>>> [DBG] : map e35690 wrongly marked me down at e35689
>>>> 2018-05-22 05:37:17.878347 7f6ac71e6700  0 osd.1 35690 crush map has
>>>> features 1009107927421960192, adjusting msgr requires for osds
>>>> 2018-05-22 05:37:18.296643 7f6ac71e6700  0 osd.1 35691 crush map has
>>>> features 1009107927421960192, adjusting msgr requires for osds
>>>>
>>>>
>>>> Only when all 67 OSDs are move to down state , the read/write traffic
>>>> is resumed.
>>>>
>>>> Could you please help us in resolving this issue and if it is bug , we
>>>> will create corresponding ticket.
>>>>
>>>> Thanks,
>>>> Muthu
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> [email protected]
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Luminous: resilience - private interface down , no read/write

Reply via email to