Dear All,
We have a new cluster based on v12.2.1
After three days of copying 300TB data into cephfs,
we have started getting the following Health errors:
# ceph health
HEALTH_WARN 9 clients failing to advance oldest client/flush tid;
1 MDSs report slow requests; 1 MDSs behind on trimming
ceph-mds.ceph1.log shows entries like:
2017-10-09 08:42:30.935955 7feeaf263700 0 log_channel(cluster) log
[WRN] : client.5023 does not advance its oldest_client_tid (5760998),
100000 completed requests recorded in session
Performance has been very good; parallel rsync was running at 1.1 >
2GB/s, allowing us to copy 300TB of data in 72 hours.
[root@ceph1 ceph]# ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
730T 330T 400T 54.80
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
ecpool 1 316T 62.24 153T 89269703
mds_nvme 2 188G 8.18 706G 368806
The cluster has 10 nodes, each with 10x 8TB drives.
We are using EC8+2, no upper tier, i.e. allow_ec_overwrites true.
Four nodes have nvme drives, used for 3x replicated MDS metadata.
We have a single MDS server, snapshot cephfs every 10 minutes, then
delete all snapshots older than 24 hours, apart from midnight snapshots.
We use ceph-fuse client on all OSD nodes. The parallel rsync is run
directly on them. Hardware consists of dual Xeon E5-2620v4, with 64GB
ram, 10Gb eth, OS is SL 7.4.
Any ideas?
thanks,
Jake
--
Jake Grimmett
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com