Re: [ceph-users] I/O hangs when one of three nodes is down

Paul Emmerich Thu, 07 Jun 2018 06:26:42 -0700

can you post your pool configuration?

 ceph osd pool ls detail


and the crush rule if you modified it.


Paul

2018-06-07 14:52 GMT+02:00 Фролов Григорий <[email protected]>:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Hello. Could you please help me troubleshoot the issue. I have 3 nodes
> in a cluster. ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT
> PRIMARY-AFFINITY  -1 0.02637 root default
>        -2 0.00879     host testk8s3                                     0
> 0.00879         osd.0          up  1.00000          1.00000  -3 0.00879
>  host testk8s1                                     1 0.00879         osd.1
>       down        0          1.00000  -4 0.00879     host testk8s2
>                            2 0.00879         osd.2          up  1.00000
>       1.00000  Each node runs ceph-osd, ceph-mon and ceph-mds. So when all
> nodes are up, everything is fine. When any of 3 nodes goes down, no matter
> if it shuts down gracefully or in a hard way, remaining nodes cannot read
> or write to the catalog where ceph storage is mounted. They also cannot
> unmount the volume. Every process touching the catalog just hangs forever,
> going into uninterruptible sleep. When I try to strace that process, strace
> hangs too. When the failed node goes up, each hung process finishes
> successfully. So what could cause the issue? root@testk8s2:~# ps -eo
> pid,stat,cmd | grep ls  3700 D    ls --color=auto /mnt/db  3997 S+   grep
> --color=auto ls root@testk8s2:~# strace -p 3700& [1] 4020 root@testk8s2:~#
> strace: Process 3700 attached root@testk8s2:~# ps -eo pid,stat,cmd | grep
> strace  4020 S    strace -p 3700 root@testk8s2:~# umount /mnt& [2] 4084
> root@testk8s2:~# ps -eo pid,state,cmd | grep umount  4084 D umount /mnt
> root@testk8s2:~# ceph -v ceph version 10.2.10
> (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) root@testk8s2:~# ceph -s
> cluster 0bcc00ec-731a-4734-8d76-599f70f06209      health HEALTH_ERR
>     80 pgs degraded             80 pgs stuck degraded             80 pgs
> stuck unclean             80 pgs stuck undersized             80 pgs
> undersized             recovery 1075/3225 objects degraded (33.333%)
>       mds rank 2 has failed             mds cluster is degraded
> 1 mons down, quorum 1,2 testk8s2,testk8s3      monmap e1: 3 mons at
> {testk8s1=10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0
> <http://10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0>}
>             election epoch 120, quorum 1,2 testk8s2,testk8s3       fsmap
> e14084: 2/3/3 up {0=testk8s2=up:active,1=testk8s3=up:active}, 1 failed
>  osdmap e9939: 3 osds: 2 up, 2 in; 80 remapped pgs             flags
> sortbitwise,require_jewel_osds       pgmap v17491: 80 pgs, 3 pools, 194 MB
> data, 1075 objects             1530 MB used, 16878 MB / 18408 MB avail
>         1075/3225 objects degraded (33.333%)                   80
> active+undersized+degraded Thanks. kind regards, Grigori *
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I/O hangs when one of three nodes is down

Reply via email to