Hi,

We have a 3 ceph clusters (Hammer 0.94.5) on same physical nodes Using LXC on 
debian Wheezy. Each physical node has 12 4To 7200 RPM hard drive, 2x200Gb SSD 
MLC, 2 x 10 Gb ethernet. On each physical drive we have an lxc container for 1 
OSD and the journal is on SSD partition.

One of our ceph clusters has 96 OSD with 1024 Pgp.
Last week we raised our Pgp from 1024 to 2048 in one pass. Bad idea :(. You 
need to read the fucking manual before upgrading this kind of parameter.
Ceph was a bit stressed and can't return to normal. A few OSD (~10%) were 
flapping


On our physical nodes, we noticed some network problems:
Ping 127.0.0.1:
64 bytes from 127.0.0.1: icmp_req=1258 ttl=64 time=0.146 ms
ping: sendmsg: Invalid argument
64 bytes from 127.0.0.1: icmp_req=1260 ttl=64 time=0.023 ms
ping: sendmsg: Invalid argument
64 bytes from 127.0.0.1: icmp_req=1262 ttl=64 time=0.028 ms
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
64 bytes from 127.0.0.1: icmp_req=1266 ttl=64 time=0.026 ms
64 bytes from 127.0.0.1: icmp_req=1267 ttl=64 time=0.142 ms
ping: sendmsg: Invalid argument
ping: sendmsg: Invalid argument
64 bytes from 127.0.0.1: icmp_req=1270 ttl=64 time=0.137 ms
ping: sendmsg: Invalid argument


With our kernel  (3.16) nothing in the logs.After a few days of research, we 
tried to upgrade kernel to a newer version (4.4.4). Not so easy to backport it 
to debian wheezy but after a few hours, it works. The problem wasn't gone away 
but we noticed a new message in logs:
arp_cache: Neighbour table overflow.

In Debian , arp cache level 1 has only 128 records !

We had this to our sysctl.conf on every physical node:
net.ipv4.neigh.default.gc_thresh1 = 4096
net.ipv4.neigh.default.gc_thresh2 = 8192
net.ipv4.neigh.default.gc_thresh3 = 8192
net.ipv4.neigh.default.gc_stale_time = 86400


Immediately networks problems disappeared and our cluster came back to a better 
state in a few hours : HEALTH_OK :)


To sum up:
Do not raise your pgp in one pass !
Look at your kernel parameters, you may need some tweaks to be fine

Regards

Pierre DOUCET

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to