Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD
backed CephFS pool.
Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running mix
of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and clients.
> $ ceph versions
> {
> "mon": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 3
> },
> "mgr": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 3
> },
> "osd": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 74
> },
> "mds": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 2
> },
> "overall": {
> "ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
> luminous (stable)": 82
> }
> }
>
> <https://www.anandtech.com/show/12116/amd-and-microsoft-announce-azure-vms-with-32core-epyc-cpus>HEALTH_ERR
> 1 MDSs report oversized cache; 1 MDSs have many clients failing to respond
> to cache pressure; 1 MDSs behind on tr
> imming; noout,nodeep-scrub flag(s) set; application not enabled on 1 pool(s);
> 242 slow requests are blocked > 32 sec
> ; 769378 stuck requests are blocked > 4096 sec
> MDS_CACHE_OVERSIZED 1 MDSs report oversized cache
> mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by
> clients, 1 stray files
> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache
> pressure
> mdsdb(mds.0): Many clients (37) failing to respond to cache
> pressureclient_count: 37
> MDS_TRIM 1 MDSs behind on trimming
> mdsdb(mds.0): Behind on trimming (36252/30)max_segments: 30,
> num_segments: 36252
> OSDMAP_FLAGS noout,nodeep-scrub flag(s) set
> REQUEST_SLOW 242 slow requests are blocked > 32 sec
> 236 ops are blocked > 2097.15 sec
> 3 ops are blocked > 1048.58 sec
> 2 ops are blocked > 524.288 sec
> 1 ops are blocked > 32.768 sec
> REQUEST_STUCK 769378 stuck requests are blocked > 4096 sec
> 91 ops are blocked > 67108.9 sec
> 121258 ops are blocked > 33554.4 sec
> 308189 ops are blocked > 16777.2 sec
> 251586 ops are blocked > 8388.61 sec
> 88254 ops are blocked > 4194.3 sec
> osds 0,1,3,6,8,12,15,16,17,21,22,23 have stuck requests > 16777.2 sec
> osds 4,7,9,10,11,14,18,20 have stuck requests > 33554.4 sec
> osd.13 has stuck requests > 67108.9 sec
This is across 8 nodes, holding 3x 8TB HDD’s each, all backed by Intel P3600
NVMe drives for journaling.
Removed SSD OSD’s for brevity.
> $ ceph osd tree
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -13 87.28799 root ssd
> -1 174.51500 root default
> -10 174.51500 rack default.rack2
> -55 43.62000 chassis node2425
> -2 21.81000 host node24
> 0 hdd 7.26999 osd.0 up 1.00000 1.00000
> 8 hdd 7.26999 osd.8 up 1.00000 1.00000
> 16 hdd 7.26999 osd.16 up 1.00000 1.00000
> -3 21.81000 host node25
> 1 hdd 7.26999 osd.1 up 1.00000 1.00000
> 9 hdd 7.26999 osd.9 up 1.00000 1.00000
> 17 hdd 7.26999 osd.17 up 1.00000 1.00000
> -56 43.63499 chassis node2627
> -4 21.81999 host node26
> 2 hdd 7.27499 osd.2 up 1.00000 1.00000
> 10 hdd 7.26999 osd.10 up 1.00000 1.00000
> 18 hdd 7.27499 osd.18 up 1.00000 1.00000
> -5 21.81499 host node27
> 3 hdd 7.26999 osd.3 up 1.00000 1.00000
> 11 hdd 7.26999 osd.11 up 1.00000 1.00000
> 19 hdd 7.27499 osd.19 up 1.00000 1.00000
> -57 43.62999 chassis node2829
> -6 21.81499 host node28
> 4 hdd 7.26999 osd.4 up 1.00000 1.00000
> 12 hdd 7.26999 osd.12 up 1.00000 1.00000
> 20 hdd 7.27499 osd.20 up 1.00000 1.00000
> -7 21.81499 host node29
> 5 hdd 7.26999 osd.5 up 1.00000 1.00000
> 13 hdd 7.26999 osd.13 up 1.00000 1.00000
> 21 hdd 7.27499 osd.21 up 1.00000 1.00000
> -58 43.62999 chassis node3031
> -8 21.81499 host node30
> 6 hdd 7.26999 osd.6 up 1.00000 1.00000
> 14 hdd 7.26999 osd.14 up 1.00000 1.00000
> 22 hdd 7.27499 osd.22 up 1.00000 1.00000
> -9 21.81499 host node31
> 7 hdd 7.26999 osd.7 up 1.00000 1.00000
> 15 hdd 7.26999 osd.15 up 1.00000 1.00000
> 23 hdd 7.27499 osd.23 up 1.00000 1.00000
Trying to figure out what in my configuration is off, because I am told that
CephFS should be able to throttle the requests to match the underlying storage
medium and not create such an extensive log jam.
> [mds]
> mds_cache_size = 0
> mds_cache_memory_limit = 8589934592
>
> [osd]
> osd_op_threads = 4
> filestore max sync interval = 30
> osd_max_backfills = 10
> osd_recovery_max_active = 16
> osd_op_thread_suicide_timeout = 600
I originally had the mds_cache_size set to 10000000 from Jewel, but read that
it is better to 0 that and set limits in the mds_cache_memory_limit now. So I
set that to 8GB to see if that helped any.
Because I haven’t seen anything less than I believe 4.13 kernel for the
Luminous capabilities CephFS kernel driver, everything is using Jewel
capabilities for CephFS.
> $ ceph features
> {
> "mon": {
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 3
> }
> },
> "mds": {
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 2
> }
> },
> "osd": {
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 74
> }
> },
> "client": {
> "group": {
> "features": "0x107b84a842aca",
> "release": "hammer",
> "num": 2
> },
> "group": {
> "features": "0x40107b86a842ada",
> "release": "jewel",
> "num": 39
> },
> "group": {
> "features": "0x7010fb86aa42ada",
> "release": "jewel",
> "num": 1
> },
> "group": {
> "features": "0x1ffddff8eea4fffb",
> "release": "luminous",
> "num": 189
> }
> }
> }
Any help is appreciated.
Thanks,
Reed
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com