Re: [ceph-users] PGS stuck inactive and osd down

Vincenzo Pii Fri, 13 May 2016 00:34:12 -0700

> On 12 May 2016, at 19:27, Vincenzo Pii <[email protected]> wrote:
> 
> I have installed a new ceph cluster with ceph-ansible (using the same version 
> and playbook that had worked before, with some necessary changes to 
> variables).
> 
> The only major difference is that now an osd (osd3) has a disk twice as big 
> as the others, thus a different weight (check the crushmap excerpt below).
> 
> Ceph version is jewel (10.2.0 (3a9fba20ec743699b69bd0181dd6c54dc01c64b9)) and 
> the setup has a single monitor node (it will be three in production) and 
> three osds.
> 
> Any help to find the issue will be highly appreciated!
> 
> # ceph status
>     cluster f7f42c59-b8ec-4d68-bb09-41f7a10c6223
>      health HEALTH_ERR
>             448 pgs are stuck inactive for more than 300 seconds
>             448 pgs stuck inactive
>      monmap e1: 1 mons at {sbb=10.2.48.205:6789/0}
>             election epoch 3, quorum 0 sbb
>       fsmap e8: 0/0/1 up
>      osdmap e10: 3 osds: 0 up, 0 in
>             flags sortbitwise
>       pgmap v11: 448 pgs, 4 pools, 0 bytes data, 0 objects
>             0 kB used, 0 kB / 0 kB avail
>                  448 creating
> 
> From the crushmap:
> 
> host osd1 {
>         id -2           # do not change unnecessarily
>         # weight 1.811
>         alg straw
>         hash 0  # rjenkins1
>         item osd.0 weight 1.811
> }
> host osd2 {
>         id -3           # do not change unnecessarily
>         # weight 1.811
>         alg straw
>         hash 0  # rjenkins1
>         item osd.1 weight 1.811
> }
> host osd3 {
>         id -4           # do not change unnecessarily
>         # weight 3.630
>         alg straw
>         hash 0  # rjenkins1
>         item osd.2 weight 3.630
> }
> 
> Vincenzo Pii | TERALYTICS
> DevOps Engineer
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland 
> phone: +41 (0) 79 191 11 08
> email: [email protected] 
> <mailto:[email protected]>
> www.teralytics.net 
> <http://www.google.com/url?q=http%3A%2F%2Fwww.teralytics.net%2F&sa=D&sntz=1&usg=AFrqEzen-TJ7SdCY0QomEByjPQDNuEuP0A>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann de 
> Vries
> 
> This e-mail message contains confidential information which is for the sole 
> attention and use of the intended recipient. Please notify us at once if you 
> think that it may not be intended for you and delete it immediately. 
> 
>


Problem found, I misconfigured the public_network and cluster_network variables 
for some of the hosts (I moved some configuration to host_vars).
It was easy to spot once I checked the logs of those hosts.

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] PGS stuck inactive and osd down

Reply via email to