Re: [ceph-users] kraken-bluestore 11.2.0 memory leak issue

nokia ceph Wed, 29 Mar 2017 00:13:31 -0700

Hello,

We manually fixed the issue and below is  our analysis.


Due to high CPU utilisation we stopped the ceph-mgr on all our cluster.
On one of our cluster we saw high memory usage by OSDs some grater than 5GB
causing OOM , resulting in process kill.

The memory was released immediately when the ceph-mgr started . So, this
issue is clearly a side effect of stopping the ceph-mgr process . What we
dont understand is why all OSDs are reporting a single OSD issue and
locking so much of memory until ceph-mgr is started ?

1. Ceph status higliting there is issue on one of the OSD in 5th node
"wrong node":

cn1.vn1ldv1c1.cdn ~# ceph status
2017-03-28 05:54:52.210450 7f8108a84700 1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
2017-03-28 05:54:52.231551 7f8108a84700 -1 WARNING: the following dangerous
and experimental features are enabled: bluestore,rocksdb
2017-03-28 05:54:52.400565 7f8101ac6700 0 - 10.139.4.81:0/2856869581 >>
10.139.4.85:6800/44958 conn(0x7f80e8002bc0 :-1
*s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY
pgs=0 cs=0 l=1)._process_connection connect claims to be
10.139.4.85:6800/273761 <http://10.139.4.85:6800/273761> not
10.139.4.85:6800/44958 <http://10.139.4.85:6800/44958> - wrong node!*
cluster 71a32568-96f3-4998-89dd-e2e7d77a6824
health HEALTH_OK
monmap e3: 5 mons at {cn1=
10.139.4.81:6789/0,cn2=10.139.4.82:6789/0,cn3=10.139.4.83:6789/0,cn4=10.139.4.84:6789/0,cn5=10.139.4.85:6789/0
}
election epoch 24, quorum 0,1,2,3,4 cn1,cn2,cn3,cn4,cn5
mgr active: cn5
osdmap e2010: 335 osds: 335 up, 335 in
flags sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v561117: 8192 pgs, 1 pools, 28323 GB data, 12400 kobjects
37667 GB used, 1182 TB / 1218 TB avail
8192 active+clean
client io 31732 kB/s rd, 57763 kB/s wr, 59 op/s rd, 479 op/s wr

2. numastat of ceph shows that it has consumed total of 275GB memory and
most of them consuming more than 5GB.

cn1.vn1ldv1c1.cdn /var/log/cassandra# numastat -s ceph

Per-node process memory usage (in MBs)
PID Node 0 Node 1 Total
----------------- --------------- --------------- ---------------
372602 (ceph-osd) 5418.34 2.84 5421.18
491602 (ceph-osd) 5351.95 2.83 5354.78
417717 (ceph-osd) 5175.98 2.83 5178.81
273980 (ceph-osd) 5167.83 2.82 5170.65
311956 (ceph-osd) 5167.04 2.84 5169.88
440537 (ceph-osd) 5161.57 2.84 5164.41
368422 (ceph-osd) 5157.87 2.83 5160.70
292227 (ceph-osd) 5156.42 2.83 5159.25
360749 (ceph-osd) 5129.43 2.83 5132.26
516040 (ceph-osd) 5112.53 2.84 5115.37
526274 (ceph-osd) 5108.76 2.83 5111.59
300197 (ceph-osd) 5096.64 2.82 5099.46
487087 (ceph-osd) 5081.70 2.82 5084.52
396562 (ceph-osd) 5060.55 2.84 5063.38
409201 (ceph-osd) 5058.06 2.83 5060.89
284767 (ceph-osd) 5027.94 2.82 5030.76
520653 (ceph-osd) 4997.16 2.82 4999.98
302873 (ceph-osd) 4988.78 2.82 4991.60
364601 (ceph-osd) 4884.61 2.83 4887.43
426998 (ceph-osd) 4865.89 2.82 4868.72
294856 (ceph-osd) 4851.96 2.82 4854.78
306064 (ceph-osd) 4780.84 2.85 4783.68
449676 (ceph-osd) 4764.82 2.84 4767.66
376469 (ceph-osd) 4750.47 2.82 4753.29
482502 (ceph-osd) 4729.85 2.84 4732.69
357126 (ceph-osd) 4706.88 2.82 4709.71
346001 (ceph-osd) 4693.43 2.84 4696.27
511640 (ceph-osd) 4668.58 2.82 4671.41
282682 (ceph-osd) 4614.66 2.84 4617.50
287330 (ceph-osd) 4613.75 2.82 4616.57
506197 (ceph-osd) 4604.59 2.84 4607.43
332253 (ceph-osd) 4587.28 2.82 4590.11
413603 (ceph-osd) 4579.29 2.84 4582.12
297473 (ceph-osd) 4569.20 2.84 4572.04
431396 (ceph-osd) 4537.83 2.84 4540.66
501424 (ceph-osd) 4533.19 2.84 4536.03
477729 (ceph-osd) 4505.37 2.83 4508.20
392520 (ceph-osd) 4439.75 2.84 4442.59
280349 (ceph-osd) 4389.06 2.82 4391.88
321805 (ceph-osd) 4385.80 2.82 4388.62
463759 (ceph-osd) 4369.09 2.82 4371.91
328971 (ceph-osd) 4349.35 2.82 4352.18
530916 (ceph-osd) 4330.21 2.82 4333.03
468626 (ceph-osd) 4272.68 2.83 4275.51
353261 (ceph-osd) 4266.01 2.82 4268.83
339729 (ceph-osd) 4194.93 2.82 4197.75
422844 (ceph-osd) 4157.31 2.82 4160.14
400631 (ceph-osd) 4155.34 2.82 4158.16
325467 (ceph-osd) 4144.66 2.84 4147.50
380309 (ceph-osd) 4119.42 2.82 4122.24
454764 (ceph-osd) 4007.09 2.82 4009.92
336089 (ceph-osd) 4003.25 2.82 4006.07
349613 (ceph-osd) 3953.32 2.84 3956.15
473107 (ceph-osd) 3833.75 2.83 3836.59
388421 (ceph-osd) 3776.79 2.83 3779.62
308957 (ceph-osd) 3758.94 2.82 3761.76
315430 (ceph-osd) 3677.42 2.82 3680.24
445064 (ceph-osd) 3669.27 2.82 3672.09
977162 (ceph-osd) 1508.02 3.40 1511.43
166155 (ceph-osd) 1411.64 3.42 1415.06
228123 (ceph-osd) 1399.20 3.41 1402.60
39367 (ceph-osd) 1397.44 3.41 1400.85
228124 (ceph-osd) 1227.50 3.41 1230.91
284384 (ceph-osd) 1204.96 3.41 1208.37
339890 (ceph-osd) 1139.69 3.41 1143.10
467652 (ceph-osd) 1016.18 3.41 1019.59
597584 (ceph-osd) 901.18 3.41 904.58
934986 (ceph-mon) 0.02 184.65 184.67
----------------- --------------- --------------- ---------------
Total 278720.27 379.42 279099.69

3. OOM tries to kill the ceph-osd process

Mar 27 01:57:05 cn1 kernel: ceph-osd invoked oom-killer: gfp_mask=0x280da,
order=0, oom_score_adj=0
Mar 27 01:57:05 cn1 kernel: ceph-osd cpuset=/ mems_allowed=0-1
Mar 27 01:57:05 cn1 kernel: CPU: 0 PID: 422861 Comm: ceph-osd Not tainted
3.10.0-327.el7.x86_64 #1 <http://tracker.ceph.com/issues/1>
Mar 27 01:57:05 cn1 kernel: Hardware name: HP ProLiant XL450 Gen9
Server/ProLiant XL450 Gen9 Server, BIOS U21 09/12/2016
Mar 27 01:57:05 cn1 kernel: ffff884546751700 00000000275e2e50
ffff88454137b6f0 ffffffff816351f1
Mar 27 01:57:05 cn1 kernel: ffff88454137b780 ffffffff81630191
000000000487ffff ffff8846f0665590
Mar 27 01:57:05 cn1 kernel: ffff8845411b3ad8 ffff88454137b7d0
ffffffffffffffd5 0000000000000001
Mar 27 01:57:05 cn1 kernel: Call Trace:
Mar 27 01:57:05 cn1 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
Mar 27 01:57:05 cn1 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
Mar 27 01:57:05 cn1 kernel: [<ffffffff8116cdee>]
oom_kill_process+0x24e/0x3b0
Mar 27 01:57:05 cn1 kernel: [<ffffffff8116c956>] ?
find_lock_task_mm+0x56/0xc0
Mar 27 01:57:05 cn1 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
Mar 27 01:57:05 cn1 kernel: [<ffffffff811737f5>]
__alloc_pages_nodemask+0xa95/0xb90
Mar 27 01:57:05 cn1 kernel: [<ffffffff811b78ca>] alloc_pages_vma+0x9a/0x140
Mar 27 01:57:05 cn1 kernel: [<ffffffff81197655>] handle_mm_fault+0xb85/0xf50
Mar 27 01:57:05 cn1 kernel: [<ffffffffa04f5b22>] ?
xfs_perag_get_tag+0x42/0xe0 [xfs]
Mar 27 01:57:05 cn1 kernel: [<ffffffff81640e22>] __do_page_fault+0x152/0x420
Mar 27 01:57:05 cn1 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
Mar 27 01:57:05 cn1 kernel: [<ffffffff8163d408>] page_fault+0x28/0x30
Mar 27 01:57:05 cn1 kernel: [<ffffffff813000c9>] ?
copy_user_enhanced_fast_string+0x9/0x20
Mar 27 01:57:05 cn1 kernel: [<ffffffff8130600a>] ? memcpy_toiovec+0x4a/0x90
Mar 27 01:57:05 cn1 kernel: [<ffffffff8151f91f>]
skb_copy_datagram_iovec+0x12f/0x2a0
Mar 27 01:57:05 cn1 kernel: [<ffffffff81574418>] tcp_recvmsg+0x248/0xbc0
Mar 27 01:57:05 cn1 kernel: [<ffffffff810bb685>] ? sched_clock_cpu+0x85/0xc0
Mar 27 01:57:05 cn1 kernel: [<ffffffff815a10eb>] inet_recvmsg+0x7b/0xa0
Mar 27 01:57:05 cn1 kernel: [<ffffffff8150ffb6>]
sock_aio_read.part.7+0x146/0x160
Mar 27 01:57:05 cn1 kernel: [<ffffffff8150fff1>] sock_aio_read+0x21/0x30
Mar 27 01:57:05 cn1 kernel: [<ffffffff811ddcdd>] do_sync_read+0x8d/0xd0
Mar 27 01:57:05 cn1 kernel: [<ffffffff811de4e5>] vfs_read+0x145/0x170
Mar 27 01:57:05 cn1 kernel: [<ffffffff811def8f>] SyS_read+0x7f/0xe0
Mar 27 01:57:05 cn1 kernel: [<ffffffff81645909>]
system_call_fastpath+0x16/0x1b
Mar 27 01:57:05 cn1 kernel: Mem-Info:
Mar 27 01:57:05 cn1 kernel: Node 0 DMA per-cpu:

4. On all OSDs below error flooded
2017-03-28 12:51:28.889658 7f82dd053700 0 -- 10.139.4.83:6850/121122 >>
10.139.4.85:6800/44958 conn(0x7f82eeeea000 :-1
s=STATE_CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=1164657 cs=2
l=0)._process_connection connect claims to be 10.139.4.85:6800/273761 not
10.139.4.85:6800/44958 - *wrong node!*

on affected osd logs,
2017-03-28 12:51:29.191346 7f8a775b6700 0 -- 10.139.4.85:6800/273761 >> -
conn(0x7f8aad6c0000 :6800 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
l=0).fault with nothing to send and in the half *accept state just closed*
2017-03-28 12:51:29.249841 7f8a775b6700 0 -- 10.139.4.85:6800/273761 >> -
conn(0x7f8aacc34800 :6800 s=STATE_ACCEPTING_WAIT_BANNER_ADDR pgs=0 cs=0
l=0).fault with nothing to send and in the half accept state just closed

5. Once ceph-mgr started on the affected OSD node, all OSDs move to
reconnect state.

2017-03-28 12:52:37.468455 7f8a77db7700 0 -- *10.139.4.85:6800
<http://10.139.4.85:6800>*/273761 >> 10.139.4.85:6928/185705
conn(0x7f8aacc34800 :-1 s=STATE_OPEN pgs=32 cs=1 l=0).*fault initiating
reconnect*

2017-03-28 12:52:37.468502 7f2bbcd5b700 0 -- 10.139.4.85:6897/154091 >>
10.139.4.85:6928/185705 conn(0x7f2c63448800 :-1 s=STATE_OPEN pgs=301 cs=1
l=0).fault initiating reconnect

2017-03-28 12:52:37.469503 7fd36f161700 0 -- *10.139.4.84:6822
<http://10.139.4.84:6822>*/95096 >> 10.139.4.85:6928/185705
conn(0x7fd412f69800 :-1 s=STATE_OPEN pgs=173 cs=1 l=0).*fault initiating
reconnect*

2017-03-28 12:52:37.463913 7f82dd053700 0 -- *10.139.4.83:6850
<http://10.139.4.83:6850>*/121122 >> 10.139.4.85:6928/185705
conn(0x7f83da64b800 :-1 s=STATE_OPEN pgs=154 cs=1 l=0).*fault initiating
reconnect*

2017-03-28 12:52:37.468406 7fea1fc05700 0 -- *10.139.4.82:6816
<http://10.139.4.82:6816>*/97331 >> 10.139.4.85:6928/185705
conn(0x7feab70f6000 :-1 s=STATE_OPEN pgs=108 cs=1 l=0).*fault initiating
reconnect*

6. Suddenly the memory decreased from 275GB to 147GB.

So, what is the impact ceph-mgr creating here ?


Thanks

On Tue, Mar 28, 2017 at 2:49 PM, Jay Linux <[email protected]> wrote:

> Hello,
>
> We still facing same memory leak issue even if we specify
> bluestore_cache_size to 100M which caused ceph OSD process killed by out of
> memory .
>
> Mar 27 01:57:05 cn1 kernel: ceph-osd invoked oom-killer: gfp_mask=0x280da,
> order=0, oom_score_adj=0
> Mar 27 01:57:05 cn1 kernel: ceph-osd cpuset=/ mems_allowed=0-1
> Mar 27 01:57:05 cn1 kernel: CPU: 0 PID: 422861 Comm: ceph-osd Not tainted
> 3.10.0-327.el7.x86_64 #1
> Mar 27 01:57:05 cn1 kernel: Hardware name: HP ProLiant XL450 Gen9
> Server/ProLiant XL450 Gen9 Server, BIOS U21 09/12/2016
> Mar 27 01:57:05 cn1 kernel: ffff884546751700 00000000275e2e50
> ffff88454137b6f0 ffffffff816351f1
> Mar 27 01:57:05 cn1 kernel: ffff88454137b780 ffffffff81630191
> 000000000487ffff ffff8846f0665590
> Mar 27 01:57:05 cn1 kernel: ffff8845411b3ad8 ffff88454137b7d0
> ffffffffffffffd5 0000000000000001
> Mar 27 01:57:05 cn1 kernel: Call Trace:
> Mar 27 01:57:05 cn1 kernel: [<ffffffff816351f1>] dump_stack+0x19/0x1b
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81630191>] dump_header+0x8e/0x214
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116cdee>]
> oom_kill_process+0x24e/0x3b0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116c956>] ?
> find_lock_task_mm+0x56/0xc0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811737f5>]
> __alloc_pages_nodemask+0xa95/0xb90
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811b78ca>] alloc_pages_vma+0x9a/0x140
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81197655>]
> handle_mm_fault+0xb85/0xf50
> Mar 27 01:57:05 cn1 kernel: [<ffffffffa04f5b22>] ?
> xfs_perag_get_tag+0x42/0xe0 [xfs]
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81640e22>]
> __do_page_fault+0x152/0x420
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81641113>] do_page_fault+0x23/0x80
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8163d408>] page_fault+0x28/0x30
> Mar 27 01:57:05 cn1 kernel: [<ffffffff813000c9>] ? copy_user_enhanced_fast_
> string+0x9/0x20
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8130600a>] ? memcpy_toiovec+0x4a/0x90
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8151f91f>]
> skb_copy_datagram_iovec+0x12f/0x2a0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81574418>] tcp_recvmsg+0x248/0xbc0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff810bb685>] ?
> sched_clock_cpu+0x85/0xc0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff815a10eb>] inet_recvmsg+0x7b/0xa0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8150ffb6>]
> sock_aio_read.part.7+0x146/0x160
> Mar 27 01:57:05 cn1 kernel: [<ffffffff8150fff1>] sock_aio_read+0x21/0x30
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811ddcdd>] do_sync_read+0x8d/0xd0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811de4e5>] vfs_read+0x145/0x170
> Mar 27 01:57:05 cn1 kernel: [<ffffffff811def8f>] SyS_read+0x7f/0xe0
> Mar 27 01:57:05 cn1 kernel: [<ffffffff81645909>]
> system_call_fastpath+0x16/0x1b
>
> For several occurance of OOM event
>
> #dmesg -T | grep -i memory
> [Mon Mar 27 02:51:25 2017] Out of memory: Kill process 459076 (ceph-osd)
> score 18 or sacrifice child
> [Mon Mar 27 06:41:16 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
> [Mon Mar 27 06:41:16 2017] Out of memory: Kill process 976901 (java) score
> 31 or sacrifice child
> [Mon Mar 27 06:43:55 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
> [Mon Mar 27 06:43:55 2017] Out of memory: Kill process 37351 (java) score
> 31 or sacrifice child
> [Mon Mar 27 06:43:55 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
> [Mon Mar 27 06:43:55 2017] Out of memory: Kill process 435981 (ceph-osd)
> score 17 or sacrifice child
> [Mon Mar 27 11:06:07 2017]  [<ffffffff8116d616>] out_of_memory+0x4b6/0x4f0
>
>
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> node 0 size: 294786 MB
> node 0 free: 3447 MB  ===>>> Used almost 98%
>
>
> While analysing numastat results, here you can find each osd consumes more
> than 5G.
>
> ====
> # numastat -s ceph
>
> Per-node process memory usage (in MBs)
> PID                         Node 0          Node 1           Total
> -----------------  --------------- --------------- ---------------
> 372602 (ceph-osd)          5418.34            2.84         5421.18
> 491602 (ceph-osd)          5351.95            2.83         5354.78
> 417717 (ceph-osd)          5175.98            2.83         5178.81
> 273980 (ceph-osd)          5167.83            2.82         5170.65
> 311956 (ceph-osd)          5167.04            2.84         5169.88
> 440537 (ceph-osd)          5161.57            2.84         5164.41
> 368422 (ceph-osd)          5157.87            2.83         5160.70
> 292227 (ceph-osd)          5156.42            2.83         5159.25
> ====
>
> Is there any way to fix the memory leak? Awaiting for your comments.
>
> ---
> bluestore_cache_size = 107374182
> bluefs_buffered_io = true
> ---
>
> Env:- RHEL7.2
>          v11.2.0 kraken , EC 4+1
>
> FYI - http://tracker.ceph.com/issues/18924 Raised already a tracker for
> this issue.
>
> Thanks
>
>
> On Mon, Feb 20, 2017 at 11:18 AM, Jay Linux <[email protected]>
> wrote:
>
>> Hello Shinobu,
>>
>> We already raised ticket for this issue. FYI -
>> http://tracker.ceph.com/issues/18924
>>
>> Thanks
>> Jayaram
>>
>>
>> On Mon, Feb 20, 2017 at 12:36 AM, Shinobu Kinjo <[email protected]>
>> wrote:
>>
>>> Please open ticket at http://tracker.ceph.com, if you haven't yet.
>>>
>>> On Thu, Feb 16, 2017 at 6:07 PM, Muthusamy Muthiah
>>> <[email protected]> wrote:
>>> > Hi Wido,
>>> >
>>> > Thanks for the information and let us know if this is a bug.
>>> > As workaround we will go with small bluestore_cache_size to 100MB.
>>> >
>>> > Thanks,
>>> > Muthu
>>> >
>>> > On 16 February 2017 at 14:04, Wido den Hollander <[email protected]>
>>> wrote:
>>> >>
>>> >>
>>> >> > Op 16 februari 2017 om 7:19 schreef Muthusamy Muthiah
>>> >> > <[email protected]>:
>>> >> >
>>> >> >
>>> >> > Thanks IIya Letkowski for the information we will change this value
>>> >> > accordingly.
>>> >> >
>>> >>
>>> >> What I understand from yesterday's performance meeting is that this
>>> seems
>>> >> like a bug. Lowering this buffer reduces memory, but the root-cause
>>> seems to
>>> >> be memory not being freed. A few bytes of a larger allocation still
>>> >> allocated causing this buffer not to be freed.
>>> >>
>>> >> Tried:
>>> >>
>>> >> debug_mempools = true
>>> >>
>>> >> $ ceph daemon osd.X dump_mempools
>>> >>
>>> >> Might want to view the YouTube video of yesterday when it's online:
>>> >> https://www.youtube.com/channel/UCno-Fry25FJ7B4RycCxOtfw/videos
>>> >>
>>> >> Wido
>>> >>
>>> >> > Thanks,
>>> >> > Muthu
>>> >> >
>>> >> > On 15 February 2017 at 17:03, Ilya Letkowski <
>>> [email protected]>
>>> >> > wrote:
>>> >> >
>>> >> > > Hi, Muthusamy Muthiah
>>> >> > >
>>> >> > > I'm not totally sure that this is a memory leak.
>>> >> > > We had same problems with bluestore on ceph v11.2.0.
>>> >> > > Reduce bluestore cache helped us to solve it and stabilize OSD
>>> memory
>>> >> > > consumption on the 3GB level.
>>> >> > >
>>> >> > > Perhaps this will help you:
>>> >> > >
>>> >> > > bluestore_cache_size = 104857600
>>> >> > >
>>> >> > >
>>> >> > >
>>> >> > > On Tue, Feb 14, 2017 at 11:52 AM, Muthusamy Muthiah <
>>> >> > > [email protected]> wrote:
>>> >> > >
>>> >> > >> Hi All,
>>> >> > >>
>>> >> > >> On all our 5 node cluster with ceph 11.2.0 we encounter memory
>>> leak
>>> >> > >> issues.
>>> >> > >>
>>> >> > >> Cluster details : 5 node with 24/68 disk per node , EC : 4+1 ,
>>> RHEL
>>> >> > >> 7.2
>>> >> > >>
>>> >> > >> Some traces using sar are below and attached the memory
>>> utilisation
>>> >> > >> graph
>>> >> > >> .
>>> >> > >>
>>> >> > >> (16:54:42)[cn2.c1 sa] # sar -r
>>> >> > >> 07:50:01 kbmemfree kbmemused %memused kbbuffers kbcached kbcommit
>>> >> > >> %commit
>>> >> > >> kbactive kbinact kbdirty
>>> >> > >> 10:20:01 32077264 132754368 80.54 16176 3040244 77767024 47.18
>>> >> > >> 51991692
>>> >> > >> 2676468 260
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >>
>>> >> > >> *10:30:01 32208384 132623248 80.46 16176 3048536 77832312 47.22
>>> >> > >> 51851512
>>> >> > >> 2684552 1210:40:01 32067244 132764388 80.55 16176 3059076
>>> 77832316
>>> >> > >> 47.22
>>> >> > >> 51983332 2694708 26410:50:01 30626144 134205488 81.42 16176
>>> 3064340
>>> >> > >> 78177232 47.43 53414144 2693712 411:00:01 28927656 135903976
>>> 82.45
>>> >> > >> 16176
>>> >> > >> 3074064 78958568 47.90 55114284 2702892 1211:10:01 27158548
>>> 137673084
>>> >> > >> 83.52
>>> >> > >> 16176 3080600 80553936 48.87 56873664 2708904 1211:20:01 26455556
>>> >> > >> 138376076
>>> >> > >> 83.95 16176 3080436 81991036 49.74 57570280 2708500 811:30:01
>>> >> > >> 26002252
>>> >> > >> 138829380 84.22 16176 3090556 82223840 49.88 58015048 2718036
>>> >> > >> 1611:40:01
>>> >> > >> 25965924 138865708 84.25 16176 3089708 83734584 50.80 58049980
>>> >> > >> 2716740
>>> >> > >> 1211:50:01 26142888 138688744 84.14 16176 3089544 83800100 50.84
>>> >> > >> 57869628
>>> >> > >> 2715400 16*
>>> >> > >>
>>> >> > >> ...
>>> >> > >> ...
>>> >> > >>
>>> >> > >> In the attached graph, there is increase in memory utilisation by
>>> >> > >> ceph-osd during soak test. And when it reaches the system limit
>>> of
>>> >> > >> 128GB
>>> >> > >> RAM , we could able to see the below dmesg logs related to
>>> memory out
>>> >> > >> when
>>> >> > >> the system reaches close to 128GB RAM. OSD.3 killed due to Out of
>>> >> > >> memory
>>> >> > >> and started again.
>>> >> > >>
>>> >> > >> [Tue Feb 14 03:51:02 2017] *tp_osd_tp invoked oom-killer:
>>> >> > >> gfp_mask=0x280da, order=0, oom_score_adj=0*
>>> >> > >> [Tue Feb 14 03:51:02 2017] tp_osd_tp cpuset=/ mems_allowed=0-1
>>> >> > >> [Tue Feb 14 03:51:02 2017] CPU: 20 PID: 11864 Comm: tp_osd_tp Not
>>> >> > >> tainted
>>> >> > >> 3.10.0-327.13.1.el7.x86_64 #1
>>> >> > >> [Tue Feb 14 03:51:02 2017] Hardware name: HP ProLiant XL420
>>> >> > >> Gen9/ProLiant
>>> >> > >> XL420 Gen9, BIOS U19 09/12/2016
>>> >> > >> [Tue Feb 14 03:51:02 2017]  ffff8819ccd7a280 0000000030e84036
>>> >> > >> ffff881fa58f7528 ffffffff816356f4
>>> >> > >> [Tue Feb 14 03:51:02 2017]  ffff881fa58f75b8 ffffffff8163068f
>>> >> > >> ffff881fa3478360 ffff881fa3478378
>>> >> > >> [Tue Feb 14 03:51:02 2017]  ffff881fa58f75e8 ffff8819ccd7a280
>>> >> > >> 0000000000000001 000000000001f65f
>>> >> > >> [Tue Feb 14 03:51:02 2017] Call Trace:
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff816356f4>]
>>> dump_stack+0x19/0x1b
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8163068f>]
>>> >> > >> dump_header+0x8e/0x214
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116ce7e>]
>>> >> > >> oom_kill_process+0x24e/0x3b0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116c9e6>] ?
>>> >> > >> find_lock_task_mm+0x56/0xc0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116d6a6>]
>>> >> > >> *out_of_memory+0x4b6/0x4f0*
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81173885>]
>>> >> > >> __alloc_pages_nodemask+0xa95/0xb90
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811b792a>]
>>> >> > >> alloc_pages_vma+0x9a/0x140
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811976c5>]
>>> >> > >> handle_mm_fault+0xb85/0xf50
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811957fb>] ?
>>> >> > >> follow_page_mask+0xbb/0x5c0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81197c2b>]
>>> >> > >> __get_user_pages+0x19b/0x640
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8119843d>]
>>> >> > >> get_user_pages_unlocked+0x15d/0x1f0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8106544f>]
>>> >> > >> get_user_pages_fast+0x9f/0x1a0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8121de78>]
>>> >> > >> do_blockdev_direct_IO+0x1a78/0x2610
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ?
>>> I_BDEV+0x10/0x10
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8121ea65>]
>>> >> > >> __blockdev_direct_IO+0x55/0x60
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ?
>>> I_BDEV+0x10/0x10
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81219297>]
>>> >> > >> blkdev_direct_IO+0x57/0x60
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81218c40>] ?
>>> I_BDEV+0x10/0x10
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8116af63>]
>>> >> > >> generic_file_aio_read+0x6d3/0x750
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffffa038ad5c>] ?
>>> >> > >> xfs_iunlock+0x11c/0x130 [xfs]
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811690db>] ?
>>> >> > >> unlock_page+0x2b/0x30
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81192f21>] ?
>>> >> > >> __do_fault+0x401/0x510
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff8121970c>]
>>> >> > >> blkdev_aio_read+0x4c/0x70
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811ddcfd>]
>>> >> > >> do_sync_read+0x8d/0xd0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811de45c>]
>>> vfs_read+0x9c/0x170
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff811df182>]
>>> >> > >> SyS_pread64+0x92/0xc0
>>> >> > >> [Tue Feb 14 03:51:02 2017]  [<ffffffff81645e89>]
>>> >> > >> system_call_fastpath+0x16/0x1b
>>> >> > >>
>>> >> > >>
>>> >> > >> Feb 14 03:51:40 fr-paris kernel: *Out of memory: Kill process
>>> 7657
>>> >> > >> (ceph-osd) score 45 or sacrifice child*
>>> >> > >> Feb 14 03:51:40 fr-paris kernel: Killed process 7657 (ceph-osd)
>>> >> > >> total-vm:8650208kB, anon-rss:6124660kB, file-rss:1560kB
>>> >> > >> Feb 14 03:51:41 fr-paris systemd:* [email protected]: main
>>> process
>>> >> > >> exited, code=killed, status=9/KILL*
>>> >> > >> Feb 14 03:51:41 fr-paris systemd: Unit [email protected]
>>> entered
>>> >> > >> failed
>>> >> > >> state.
>>> >> > >> Feb 14 03:51:41 fr-paris systemd: *[email protected] failed.*
>>> >> > >> Feb 14 03:51:41 fr-paris systemd: cassandra.service: main process
>>> >> > >> exited,
>>> >> > >> code=killed, status=9/KILL
>>> >> > >> Feb 14 03:51:41 fr-paris systemd: Unit cassandra.service entered
>>> >> > >> failed
>>> >> > >> state.
>>> >> > >> Feb 14 03:51:41 fr-paris systemd: cassandra.service failed.
>>> >> > >> Feb 14 03:51:41 fr-paris ceph-mgr: 2017-02-14 03:51:41.978878
>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch osd_map(7517..7517 src has
>>> >> > >> 6951..7517) v3
>>> >> > >> Feb 14 03:51:42 fr-paris systemd: Device
>>> >> > >> dev-disk-by\x2dpartlabel-ceph\x5cx20block.device
>>> >> > >> appeared twice with different sysfs paths
>>> >> > >> /sys/devices/pci0000:00/0000:0
>>> >> > >> 0:03.2/0000:03:00.0/host0/target0:0:0/0:0:0:9/block/sdj/sdj2 and
>>> >> > >> /sys/devices/pci0000:00/0000:00:03.2/0000:03:00.0/host0/targ
>>> >> > >> et0:0:0/0:0:0:4/block/sde/sde2
>>> >> > >> Feb 14 03:51:42 fr-paris ceph-mgr: 2017-02-14 03:51:42.992477
>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch osd_map(7518..7518 src has
>>> >> > >> 6951..7518) v3
>>> >> > >> Feb 14 03:51:43 fr-paris ceph-mgr: 2017-02-14 03:51:43.508990
>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>> >> > >> Feb 14 03:51:48 fr-paris ceph-mgr: 2017-02-14 03:51:48.508970
>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>> >> > >> Feb 14 03:51:53 fr-paris ceph-mgr: 2017-02-14 03:51:53.509592
>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>> >> > >> Feb 14 03:51:58 fr-paris ceph-mgr: 2017-02-14 03:51:58.509936
>>> >> > >> 7f51a3154700 -1 mgr ms_dispatch mgrdigest v1
>>> >> > >> Feb 14 03:52:01 fr-paris systemd: [email protected] holdoff
>>> time
>>> >> > >> over,
>>> >> > >> scheduling restart.
>>> >> > >> Feb 14 03:52:02 fr-paris systemd: *Starting Ceph object storage
>>> >> > >> daemon
>>> >> > >> osd.3.*..
>>> >> > >> Feb 14 03:52:02 fr-paris systemd: Started Ceph object storage
>>> daemon
>>> >> > >> osd.3.
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.307106
>>> >> > >> 7f1e499bb940
>>> >> > >> -1 WARNING: the following dangerous and experimental features are
>>> >> > >> enabled:
>>> >> > >> bluestore,rocksdb
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.317687
>>> >> > >> 7f1e499bb940
>>> >> > >> -1 WARNING: the following dangerous and experimental features are
>>> >> > >> enabled:
>>> >> > >> bluestore,rocksdb
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: starting osd.3 at - osd_data
>>> >> > >> /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: 2017-02-14 03:52:02.333522
>>> >> > >> 7f1e499bb940
>>> >> > >> -1 WARNING: experimental feature 'bluestore' is enabled
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: Please be aware that this
>>> feature
>>> >> > >> is
>>> >> > >> experimental, untested,
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: unsupported, and may result in
>>> data
>>> >> > >> corruption, data loss,
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: and/or irreparable damage to
>>> your
>>> >> > >> cluster.  Do not use
>>> >> > >> Feb 14 03:52:02 fr-paris numactl: feature with important data.
>>> >> > >>
>>> >> > >> This seems to happen only in 11.2.0 and not in 11.1.x . Could you
>>> >> > >> please
>>> >> > >> help us in resolving this issue by means of any config change to
>>> >> > >> limit the
>>> >> > >> memory use on ceph-osd or a bug in the current kraken release.
>>> >> > >>
>>> >> > >> Thanks,
>>> >> > >> Muthu
>>> >> > >>
>>> >> > >> _______________________________________________
>>> >> > >> ceph-users mailing list
>>> >> > >> [email protected]
>>> >> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >> > >>
>>> >> > >>
>>> >> > >
>>> >> > >
>>> >> > > --
>>> >> > > С уважением / Best regards
>>> >> > >
>>> >> > > Илья Летковский / Ilya Letkouski
>>> >> > >
>>> >> > > Phone, Viber: +375 29 3237335
>>> >> > >
>>> >> > > Minsk, Belarus (GMT+3)
>>> >> > >
>>> >> > _______________________________________________
>>> >> > ceph-users mailing list
>>> >> > [email protected]
>>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > ceph-users mailing list
>>> > [email protected]
>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >
>>> _______________________________________________
>>> ceph-users mailing list
>>> [email protected]
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] kraken-bluestore 11.2.0 memory leak issue

Reply via email to