can you post your pool configuration? ceph osd pool ls detail
and the crush rule if you modified it. Paul 2018-06-07 14:52 GMT+02:00 Фролов Григорий <[email protected]>: > > > > > > > > > > > > > > > > > > * Hello. Could you please help me troubleshoot the issue. I have 3 nodes > in a cluster. ID WEIGHT TYPE NAME UP/DOWN REWEIGHT > PRIMARY-AFFINITY -1 0.02637 root default > -2 0.00879 host testk8s3 0 > 0.00879 osd.0 up 1.00000 1.00000 -3 0.00879 > host testk8s1 1 0.00879 osd.1 > down 0 1.00000 -4 0.00879 host testk8s2 > 2 0.00879 osd.2 up 1.00000 > 1.00000 Each node runs ceph-osd, ceph-mon and ceph-mds. So when all > nodes are up, everything is fine. When any of 3 nodes goes down, no matter > if it shuts down gracefully or in a hard way, remaining nodes cannot read > or write to the catalog where ceph storage is mounted. They also cannot > unmount the volume. Every process touching the catalog just hangs forever, > going into uninterruptible sleep. When I try to strace that process, strace > hangs too. When the failed node goes up, each hung process finishes > successfully. So what could cause the issue? root@testk8s2:~# ps -eo > pid,stat,cmd | grep ls 3700 D ls --color=auto /mnt/db 3997 S+ grep > --color=auto ls root@testk8s2:~# strace -p 3700& [1] 4020 root@testk8s2:~# > strace: Process 3700 attached root@testk8s2:~# ps -eo pid,stat,cmd | grep > strace 4020 S strace -p 3700 root@testk8s2:~# umount /mnt& [2] 4084 > root@testk8s2:~# ps -eo pid,state,cmd | grep umount 4084 D umount /mnt > root@testk8s2:~# ceph -v ceph version 10.2.10 > (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) root@testk8s2:~# ceph -s > cluster 0bcc00ec-731a-4734-8d76-599f70f06209 health HEALTH_ERR > 80 pgs degraded 80 pgs stuck degraded 80 pgs > stuck unclean 80 pgs stuck undersized 80 pgs > undersized recovery 1075/3225 objects degraded (33.333%) > mds rank 2 has failed mds cluster is degraded > 1 mons down, quorum 1,2 testk8s2,testk8s3 monmap e1: 3 mons at > {testk8s1=10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0 > <http://10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0>} > election epoch 120, quorum 1,2 testk8s2,testk8s3 fsmap > e14084: 2/3/3 up {0=testk8s2=up:active,1=testk8s3=up:active}, 1 failed > osdmap e9939: 3 osds: 2 up, 2 in; 80 remapped pgs flags > sortbitwise,require_jewel_osds pgmap v17491: 80 pgs, 3 pools, 194 MB > data, 1075 objects 1530 MB used, 16878 MB / 18408 MB avail > 1075/3225 objects degraded (33.333%) 80 > active+undersized+degraded Thanks. kind regards, Grigori * > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
