Hi,Christian.
When I re-add these OSD(0,3,9,12,15),the high latency occur again.the
default reweight of these OSD is 0.0
root@node-65:~# ceph osd tree
# id weight type name up/down reweight
-1 103.7 root default
-2 8.19 host node-65
18 2.73 osd.18 up 1
21 0 osd.21 up 1
24 2.73 osd.24 up 1
27 2.73 osd.27 up 1
30 0 osd.30 up 1
33 0 osd.33 up 1
0 0 osd.0 up 1
3 0 osd.3 up 1
6 0 osd.6 down 0
9 0 osd.9 up 1
12 0 osd.12 up 1
15 0 osd.15 up 1
ceph osd perf:
0 9825 10211
3 9398 9775
9 35852 36904
12 24716 25626
15 18893 19633
but iostat of these device is empty.
smartctl say nothing error found in these OSD device.
2016-03-29 13:22 GMT+08:00 lin zhou <[email protected]>:
> Thanks.I try this method just like ceph document say.
> But I just test osd.6 in this way,and the leveldb of osd.6 is
> broken.so it can not start.
>
> When I try this for other osd,it works.
>
> 2016-03-29 8:22 GMT+08:00 Christian Balzer <[email protected]>:
>> On Mon, 28 Mar 2016 18:36:14 +0800 lin zhou wrote:
>>
>>> > Hello,
>>> >
>>> > On Sun, 27 Mar 2016 13:41:57 +0800 lin zhou wrote:
>>> >
>>> > > Hi,guys.
>>> > > some days ago,one osd have a large latency seeing in ceph osd
>>> > > perf.and this device make this node a high cpu await.
>>> > The thing to do at that point would have been look at things with atop
>>> > or iostat to verify that it was the device itself that was slow and not
>>> > because it was genuinely busy due to uneven activity maybe.
>>> > As well as a quick glance at SMART of course.
>>>
>>> Thanks.I will follow this when I face this problem next time.
>>>
>>> > > So,I delete this osd ad then check this device.
>>> > If that device (HDD, SSD, which model?) slowed down your cluster, you
>>> > should not have deleted it.
>>> > The best method would have been to set your cluster to noout and stop
>>> > that specific OSD.
>>> >
>>> > When you say "delete", what exact steps did you take?
>>> > Did this include removing it from the crush map?
>>>
>>> Yes,I delete it from crush map.delete its auth,and rm osd.
>>>
>>
>> Google is your friend, if you deleted it like in the link below you should
>> be be able to re-add it the same way:
>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-June/002345.html
>>
>> Christian
>>
>>> > > But nothing error found.
>>> > >
>>> > > And now I want to re-add this device into cluster with it's data.
>>> > >
>>> > All the data was already replicated elsewhere if you deleted/removed
>>> > the OSD, you're likely not going to save much if any data movement by
>>> > re-adding it.
>>>
>>> Yes,the cluster finished rebalance.but I face a problem of one unfound
>>> object. And in the output of pg query in recovery_state say,this osd is
>>> down,but other odds are ok.
>>> So I want to recover this osd to recover this unfound object.
>>>
>>> and mark_unfound_lost revert/delete do not work:
>>> Error EINVAL: pg has 1 unfound objects but we haven't probed all sources,
>>>
>>> detail see:
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/008452.html
>>>
>>> Thanks again.
>>>
>>> > >
>>> > > I try to using ceph-osd to add it,but it can not start.log are paste
>>> > > in : https://gist.github.com/hnuzhoulin/836f9e633b90041e89ad
>>> > >
>>> > > so what's the recommend steps.
>>> > That depends on how you deleted it, but at this point your data is
>>> > likely to be mostly stale anyway, so I'd start from scratch.
>>>
>>> > Christian
>>> > --
>>> > Christian Balzer Network/Systems Engineer
>>> > [email protected] Global OnLine Japan/Rakuten Communications
>>> > http://www.gol.com/
>>> >
>>>
>>
>>
>> --
>> Christian Balzer Network/Systems Engineer
>> [email protected] Global OnLine Japan/Rakuten Communications
>> http://www.gol.com/
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com