I have 3 systems w/ a cephfs mounted on them.
And i am seeing material 'lag'. By 'lag' i mean it hangs for little bits of
time (1s, sometimes 5s).
But very non repeatable.
If i run
time find . -type f -print0 | xargs -0 stat > /dev/null
it might take ~130ms.
But, it might take 10s. Once i've done it, it tends to stay @ the ~130ms,
suggesting whatever data is now in cache. On the cases it hangs, if i
remove the stat, its hanging on the find of one file. It might hiccup 1 or
2 times in the find across 10k files.
This lag might affect e.g. 'cwd', writing a file, basically all operations.
Does anyone have any suggestions? Its very irritating problem. I do no see
errors in dmesg.
The 3 systems w/ the filesystem mounted are running Ubuntu 15.10
w/ 4.3.0-040300-generic kernel. They are running cephfs from the kernel
driver, mounted in /etc/fstab as:
10.100.10.60,10.100.10.61,10.100.10.62:/ /cephfs ceph
_netdev,noauto,noatime,x-systemd.requires=network-online.target,x-systemd.automount,x-systemd.device-timeout=10,name=admin,secret=XXXX==
0 2
I have 3 mds, 1 active, 2 standby. The 3 machines are also the mons
{nubo-1/-2/-3} are the ones that have the cephfs mounted.
They have a 9K mtu between the systems, and i have checked with ping -s ###
-M do <dest> that there are no blackholes in size... up to 8954 works, and
and 8955 gives 'would fragment'.
All the storage devices are 1TB Samsung SSD, and all are on sata. There is
no material load on the system while this is occurring (a bit of background
fs usage i guess, but its otherwise idle, just me).
$ ceph status
cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
health HEALTH_OK
monmap e1: 3 mons at {nubo-1=
10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
election epoch 1070, quorum 0,1,2 nubo-1,nubo-2,nubo-3
mdsmap e587: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby
osdmap e2346: 6 osds: 6 up, 6 in
pgmap v113350: 840 pgs, 6 pools, 143 GB data, 104 kobjects
288 GB used, 5334 GB / 5622 GB avail
840 active+clean
I've checked and the network between them is perfect: no loss, ~no latency
( << 1ms, they are adjacent on an L2 segment), as are all the osd [there
are 6 osd].
ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 5.48996 root default
-2 0.89999 host nubo-1
0 0.89999 osd.0 up 1.00000 1.00000
-3 0.89999 host nubo-2
1 0.89999 osd.1 up 1.00000 1.00000
-4 0.89999 host nubo-3
2 0.89999 osd.2 up 1.00000 1.00000
-5 0.92999 host nubo-19
3 0.92999 osd.3 up 1.00000 1.00000
-6 0.92999 host nubo-20
4 0.92999 osd.4 up 1.00000 1.00000
-7 0.92999 host nubo-21
5 0.92999 osd.5 up 1.00000 1.00000
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com