Re: [ceph-users] OSD process exhausting server memory

Michael J. Kidd Wed, 29 Oct 2014 10:44:06 -0700

Ah, sorry... since they were set out manually, they'll need to be set in
manually..


for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd in $i;
done



Michael J. Kidd
Sr. Storage Consultant
Inktank Professional Services
 - by Red Hat

On Wed, Oct 29, 2014 at 12:33 PM, Lukáš Kubín <[email protected]> wrote:

> I've ended up at step "ceph osd unset noin". My OSDs are up, but not in,
> even after an hour:
>
> [root@q04 ceph-recovery]# ceph osd stat
>      osdmap e2602: 34 osds: 34 up, 0 in
>             flags nobackfill,norecover,noscrub,nodeep-scrub
>
>
> There seems to be no activity generated by OSD processes, occasionally
> they show 0,3% which I believe is just some basic communication processing.
> No load in network interfaces.
>
> Is there some other step needed to bring the OSDs in?
>
> Thank you.
>
> Lukas
>
> On Wed, Oct 29, 2014 at 3:58 PM, Michael J. Kidd <[email protected]
> > wrote:
>
>> Hello Lukas,
>>   Please try the following process for getting all your OSDs up and
>> operational...
>>
>> * Set the following flags: noup, noin, noscrub, nodeep-scrub, norecover,
>> nobackfill
>> for i in noup noin noscrub nodeep-scrub norecover nobackfill; do ceph osd
>> set $i; done
>>
>> * Stop all OSDs (I know, this seems counter productive)
>> * Set all OSDs down / out
>> for i in $(ceph osd tree | grep osd | awk '{print $3}'); do ceph osd down
>> $i; ceph osd out $i; done
>> * Set recovery / backfill throttles as well as heartbeat and OSD map
>> processing tweaks in the /etc/ceph/ceph.conf file under the [osd] section:
>> [osd]
>> osd_max_backfills = 1
>> osd_recovery_max_active = 1
>> osd_recovery_max_single_start = 1
>> osd_backfill_scan_min = 8
>> osd_heartbeat_interval = 36
>> osd_heartbeat_grace = 240
>> osd_map_message_max = 1000
>> osd_map_cache_size = 3136
>>
>> * Start all OSDs
>> * Monitor 'top' for 0% CPU on all OSD processes.. it may take a while..
>> I usually issue 'top' then, the keys M c
>>  - M = Sort by memory usage
>>  - c = Show command arguments
>>  - This allows to easily monitor the OSD process and know which OSDs have
>> settled, etc..
>> * Once all OSDs have hit 0% CPU utilization, remove the 'noup' flag
>>  - ceph osd unset noup
>> * Again, wait for 0% CPU utilization (may  be immediate, may take a
>> while.. just gotta wait)
>> * Once all OSDs have hit 0% CPU again, remove the 'noin' flag
>>  - ceph osd unset noin
>>  - All OSDs should now appear up/in, and will go through peering..
>> * Once ceph -s shows no further activity, and OSDs are back at 0% CPU
>> again, unset 'nobackfill'
>>  - ceph osd unset nobackfill
>> * Once ceph -s shows no further activity, and OSDs are back at 0% CPU
>> again, unset 'norecover'
>>  - ceph osd unset norecover
>> * Monitor OSD memory usage... some OSDs may get killed off again, but
>> their subsequent restart should consume less memory and allow more recovery
>> to occur between each step above.. and ultimately, hopefully... your entire
>> cluster will come back online and be usable.
>>
>> ## Clean-up:
>> * Remove all of the above set options from ceph.conf
>> * Reset the running OSDs to their defaults:
>> ceph tell osd.\* injectargs '--osd_max_backfills 10
>> --osd_recovery_max_active 15 --osd_recovery_max_single_start 5
>> --osd_backfill_scan_min 64 --osd_heartbeat_interval 6 --osd_heartbeat_grace
>> 36 --osd_map_message_max 100 --osd_map_cache_size 500'
>> * Unset the noscrub and nodeep-scrub flags:
>>  - ceph osd unset noscrub
>>  - ceph osd unset nodeep-scrub
>>
>>
>> ## For help identifying why memory usage was so high, please provide:
>> * ceph osd dump | grep pool
>> * ceph osd crush rule dump
>>
>> Let us know if this helps... I know it looks extreme, but it's worked for
>> me in the past..
>>
>>
>> Michael J. Kidd
>> Sr. Storage Consultant
>> Inktank Professional Services
>>  - by Red Hat
>>
>> On Wed, Oct 29, 2014 at 8:51 AM, Lukáš Kubín <[email protected]>
>> wrote:
>>
>>> Hello,
>>> I've found my ceph v 0.80.3 cluster in a state with 5 of 34 OSDs being
>>> down through night after months of running without change. From Linux logs
>>> I found out the OSD processes were killed because they consumed all
>>> available memory.
>>>
>>> Those 5 failed OSDs were from different hosts of my 4-node cluster (see
>>> below). Two hosts act as SSD cache tier in some of my pools. The other two
>>> hosts are the default rotational drives storage.
>>>
>>> After checking the Linux was not out of memory I've attempted to restart
>>> those failed OSDs. Most of those OSD daemon exhaust all memory in seconds
>>> and got killed by Linux again:
>>>
>>> Oct 28 22:16:34 q07 kernel: Out of memory: Kill process 24207 (ceph-osd)
>>> score 867 or sacrifice child
>>> Oct 28 22:16:34 q07 kernel: Killed process 24207, UID 0, (ceph-osd)
>>> total-vm:59974412kB, anon-rss:59076880kB, file-rss:512kB
>>>
>>>
>>> On the host I've found lots of similar "slow request" messages preceding
>>> the crash:
>>>
>>> 2014-10-28 22:11:20.885527 7f25f84d1700  0 log [WRN] : slow request
>>> 31.117125 seconds old, received at 2014-10-28 22:10:49.768291:
>>> osd_sub_op(client.168752.0:2197931 14.2c7
>>> 888596c7/rbd_data.293272f8695e4.000000000000006f/head//14 [] v 1551'377417
>>> snapset=0=[]:[] snapc=0=[]) v10 currently no flag points reached
>>> 2014-10-28 22:11:21.885668 7f25f84d1700  0 log [WRN] : 67 slow requests,
>>> 1 included below; oldest blocked for > 9879.304770 secs
>>>
>>>
>>> Apparently I can't get the cluster fixed by restarting the OSDs all over
>>> again. Is there any other option then?
>>>
>>> Thank you.
>>>
>>> Lukas Kubin
>>>
>>>
>>>
>>> [root@q04 ~]# ceph -s
>>>     cluster ec433b4a-9dc0-4d08-bde4-f1657b1fdb99
>>>      health HEALTH_ERR 9 pgs backfill; 1 pgs backfilling; 521 pgs
>>> degraded; 425 pgs incomplete; 13 pgs inconsistent; 20 pgs recovering; 50
>>> pgs recovery_wait; 151 pgs stale; 425 pgs stuck inactive; 151 pgs stuck
>>> stale; 1164 pgs stuck unclean; 12070270 requests are blocked > 32 sec;
>>> recovery 887322/35206223 objects degraded (2.520%); 119/17131232 unfound
>>> (0.001%); 13 scrub errors
>>>      monmap e2: 3 mons at {q03=
>>> 10.255.253.33:6789/0,q04=10.255.253.34:6789/0,q05=10.255.253.35:6789/0},
>>> election epoch 90, quorum 0,1,2 q03,q04,q05
>>>      osdmap e2194: 34 osds: 31 up, 31 in
>>>       pgmap v7429812: 5632 pgs, 7 pools, 1446 GB data, 16729 kobjects
>>>             2915 GB used, 12449 GB / 15365 GB avail
>>>             887322/35206223 objects degraded (2.520%); 119/17131232
>>> unfound (0.001%)
>>>                   38 active+recovery_wait+remapped
>>>                 4455 active+clean
>>>                   65 stale+incomplete
>>>                    3 active+recovering+remapped
>>>                  359 incomplete
>>>                   12 active+recovery_wait
>>>                  139 active+remapped
>>>                   86 stale+active+degraded
>>>                   16 active+recovering
>>>                    1 active+remapped+backfilling
>>>                   13 active+clean+inconsistent
>>>                    9 active+remapped+wait_backfill
>>>                  434 active+degraded
>>>                    1 remapped+incomplete
>>>                    1 active+recovering+degraded+remapped
>>>   client io 0 B/s rd, 469 kB/s wr, 48 op/s
>>>
>>> [root@q04 ~]# ceph osd tree
>>> # id    weight  type name       up/down reweight
>>> -5      3.24    root ssd
>>> -6      1.62            host q06
>>> 16      0.18                    osd.16  up      1
>>> 17      0.18                    osd.17  up      1
>>> 18      0.18                    osd.18  up      1
>>> 19      0.18                    osd.19  up      1
>>> 20      0.18                    osd.20  up      1
>>> 21      0.18                    osd.21  up      1
>>> 22      0.18                    osd.22  up      1
>>> 23      0.18                    osd.23  up      1
>>> 24      0.18                    osd.24  up      1
>>> -7      1.62            host q07
>>> 25      0.18                    osd.25  up      1
>>> 26      0.18                    osd.26  up      1
>>> 27      0.18                    osd.27  up      1
>>> 28      0.18                    osd.28  up      1
>>> 29      0.18                    osd.29  up      1
>>> 30      0.18                    osd.30  up      1
>>> 31      0.18                    osd.31  up      1
>>> 32      0.18                    osd.32  up      1
>>> 33      0.18                    osd.33  up      1
>>> -1      14.56   root default
>>> -4      14.56           root sata
>>> -2      7.28                    host q08
>>> 0       0.91                            osd.0   up      1
>>> 1       0.91                            osd.1   up      1
>>> 2       0.91                            osd.2   up      1
>>> 3       0.91                            osd.3   up      1
>>> 11      0.91                            osd.11  up      1
>>> 12      0.91                            osd.12  up      1
>>> 13      0.91                            osd.13  down    0
>>> 14      0.91                            osd.14  up      1
>>> -3      7.28                    host q09
>>> 4       0.91                            osd.4   up      1
>>> 5       0.91                            osd.5   up      1
>>> 6       0.91                            osd.6   up      1
>>> 7       0.91                            osd.7   up      1
>>> 8       0.91                            osd.8   down    0
>>> 9       0.91                            osd.9   up      1
>>> 10      0.91                            osd.10  down    0
>>> 15      0.91                            osd.15  up      1
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD process exhausting server memory

Reply via email to