Re: [ceph-users] Not able to start OSD

Jean-Charles Lopez Thu, 19 Oct 2017 12:36:08 -0700

Hi,

have you checked the output of "ceph-disk list” on the nodes where the OSDs are 
not coming back on?


This should give you a hint on what’s going one.

Also use dmesg to search for any error message

And finally inspect /var/log/ceph/ceph-osd.${id}.log to see messages produced 
by the OSD itself when it starts.

Regards
JC

> On Oct 19, 2017, at 12:11, Josy <[email protected]> wrote:
> 
> Hi,
> 
> I am not able to start some of the OSDs in the cluster.
> 
> This is a test cluster and had 8 OSDs. One node was taken out for 
> maintenance. I set the noout flag and after the server came back up I unset 
> the noout flag.
> 
> Suddenly couple of OSDs went down.
> 
> And now I can start the OSDs manually from each node, but the status is still 
> "down"
> 
> $  ceph osd stat
> 8 osds: 2 up, 5 in
> 
> 
> $ ceph osd tree
> ID  CLASS WEIGHT  TYPE NAME                 STATUS REWEIGHT PRI-AFF
>  -1       7.97388 root default
>  -3       1.86469     host a1-osd
>   1   ssd 1.86469         osd.1               down        0 1.00000
>  -5       0.87320     host a2-osd
>   2   ssd 0.87320         osd.2               down        0 1.00000
>  -7       0.87320     host a3-osd
>   4   ssd 0.87320         osd.4               down  1.00000 1.00000
>  -9       0.87320     host a4-osd
>   8   ssd 0.87320         osd.8                 up  1.00000 1.00000
> -11       0.87320     host a5-osd
>  12   ssd 0.87320         osd.12              down  1.00000 1.00000
> -13       0.87320     host a6-osd
>  17   ssd 0.87320         osd.17                up  1.00000 1.00000
> -15       0.87320     host a7-osd
>  21   ssd 0.87320         osd.21              down  1.00000 1.00000
> -17       0.87000     host a8-osd
>  28   ssd 0.87000         osd.28              down        0 1.00000
> 
> Also can see this error in each OSD node.
> 
> # systemctl status ceph-osd@1
> ● [email protected] - Ceph object storage daemon osd.1
>    Loaded: loaded (/usr/lib/systemd/system/[email protected]; enabled; vendor 
> preset: disabled)
>    Active: failed (Result: start-limit) since Thu 2017-10-19 11:35:18 PDT; 
> 19min ago
>   Process: 4163 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i 
> --setuser ceph --setgroup ceph (code=killed, signal=ABRT)
>   Process: 4158 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster 
> ${CLUSTER} --id %i (code=exited, status=0/SUCCESS)
>  Main PID: 4163 (code=killed, signal=ABRT)
> 
> Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: Unit [email protected] entered 
> failed state.
> Oct 19 11:34:58 ceph-las1-a1-osd systemd[1]: [email protected] failed.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: [email protected] holdoff time 
> over, scheduling restart.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: start request repeated too 
> quickly for [email protected]
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Failed to start Ceph object 
> storage daemon osd.1.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: Unit [email protected] entered 
> failed state.
> Oct 19 11:35:18 ceph-las1-a1-osd systemd[1]: [email protected] failed.
> 
> 
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Not able to start OSD

Reply via email to