I'm not aware of any memory settings that control rebuild memory usage.

You are running very under on RAM, have you tried adding more swap or adjusting 
/proc/sys/vm/swappiness




---- On Fri, 20 Sep 2019 20:41:09 +0800 Amudhan P <mailto:[email protected]> 
wrote ----


Hi,

I am using ceph mimic in a small test setup using the below configuration.



OS: ubuntu 18.04



1 node running (mon,mds,mgr) + 4 core cpu and 4GB RAM and 1 Gb lan

3 nodes each having 2 osd's, disks are 2TB + 2 core cpu and 4G RAM 

 and 1 Gb lan

1 node acting as cephfs client 

+ 2 core cpu and 4G RAM 

 and 1 Gb lan



configured cephfs_metadata_pool (3 replica) and cephfs_data_pool erasure 2+1.



When running a script doing multiple folders creation ceph started throwing 
error late IO due to high metadata workload.

once after folder creation complete PG's degraded and I am waiting for PG to 
complete recovery but my OSD's starting to crash due to OOM and restarting 
after some time.



Now my question is I can wait for recovery to complete but how do I stop OOM 
and OSD crash? basically want to know the way to control memory usage during 
recovery and make it stable.



I have also set very low PG metadata_pool 8 and data_pool 16.



I have already set "mon osd memory target to 1Gb" and I have set max-backfill 
from 1 to 8.



Attached msg from "kern.log" from one of the node and snippet of error msg in 
this mail.



---------error msg snippet ----------

-bash: fork: Cannot allocate memory



Sep 18 19:01:57 test-node1 kernel: [341246.765644] msgr-worker-0 invoked 
oom-killer: gfp_mask=0x14200ca(GFP_HIGHUSER_MOVABLE), nodemask=(null), order=0, 
oom_score_adj=0
Sep 18 19:02:00 test-node1 kernel: [341246.765645] msgr-worker-0 cpuset=/ 
mems_allowed=0
Sep 18 19:02:00 test-node1 kernel: [341246.765650] CPU: 1 PID: 1737 Comm: 
msgr-worker-0 Not tainted 4.15.0-45-generic #48-Ubuntu



Sep 18 19:02:02 test-node1 kernel: [341246.765833] Out of memory: Kill process 
1727 (ceph-osd) score 489 or sacrifice child
Sep 18 19:02:03 test-node1 kernel: [341246.765919] Killed process 1727 
(ceph-osd) total-vm:3483844kB, anon-rss:1992708kB, file-rss:0kB, shmem-rss:0kB
Sep 18 19:02:03 test-node1 kernel: [341246.899395] oom_reaper: reaped process 
1727 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Sep 18 22:09:57 test-node1 kernel: [352529.433155] perf: interrupt took too 
long (4965 > 4938), lowering kernel.perf_event_max_sample_rate to 40250



regards

Amudhan


_______________________________________________
ceph-users mailing list -- mailto:[email protected] 
To unsubscribe send an email to mailto:[email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to