Resolved.

Apparently it took the OSD almost 2.5 hours to fully boot.

Had not seen this behavior before, but it eventually booted itself back into 
the crush map.

Bookend log stamps below.

> 2016-10-07 21:33:39.241720 7f3d59a97800  0 set uid:gid to 64045:64045 
> (ceph:ceph)

> 2016-10-07 23:53:29.617038 7f3d59a97800  0 osd.0 4360 done with init, 
> starting boot process

I had noticed that there was a consistent read operation on the “down/out” osd 
tied to that osd’s PID, which led me to believe it was doing something with its 
time.

Also for reference, this was a 26% full 8TB disk.
> Filesystem            1K-blocks        Used  Available Use% Mounted on

> /dev/sda1            7806165996  1953556296 5852609700  26% 
> /var/lib/ceph/osd/ceph-0

Reed


> On Oct 7, 2016, at 7:33 PM, Reed Dier <reed.d...@focusvq.com> wrote:
> 
> Attempting to adjust parameters of some of my recovery options, I restarted a 
> single osd in the cluster with the following syntax:
> 
>> sudo restart ceph-osd id=0
> 
> 
> The osd restarts without issue, status shows running with the PID.
> 
>> sudo status ceph-osd id=0
>> ceph-osd (ceph/0) start/running, process 2685
> 
> 
> The osd marked itself down cleanly.
> 
>> 2016-10-07 19:36:20.872883 mon.0 10.0.1.249:6789/0 1475867 : cluster [INF] 
>> osd.0 marked itself down
> 
>> 2016-10-07 19:36:21.590874 mon.0 10.0.1.249:6789/0 1475869 : cluster [INF] 
>> osdmap e4361: 16 osds: 15 up, 16 in
> 
> The mon’s show this from one of many subsequent attempts to restart the osd.
> 
>> 2016-10-07 19:58:16.222949 mon.1 [INF] from='client.? 10.0.1.25:0/324114592' 
>> entity='osd.0' cmd=[{"prefix": "osd crush create-or-move", "args": 
>> ["host=node24", "root=default"], "id": 0, "weight": 7.2701}]: dispatch
>> 2016-10-07 19:58:16.223626 mon.0 [INF] from='client.6557620 :/0' 
>> entity='osd.0' cmd=[{"prefix": "osd crush create-or-move", "args": 
>> ["host=node24", "root=default"], "id": 0, "weight": 7.2701}]: dispatch
> 
> mon logs show this when grepping for the osd.0 in the mon log
> 
>> 2016-10-07 19:36:20.872882 7fd39aced700  0 log_channel(cluster) log [INF] : 
>> osd.0 marked itself down
>> 2016-10-07 19:36:27.698708 7fd39aced700  0 log_channel(audit) log [INF] : 
>> from='client.6554095 :/0' entity='osd.0' cmd=[{"prefix": "osd crush 
>> create-or-move", "args": ["host=node24", "root=default"], "id": 0, "weight": 
>> 7.2701}]: dispatch
>> 2016-10-07 19:36:27.706374 7fd39aced700  0 mon.core@0(leader).osd e4363 
>> create-or-move crush item name 'osd.0' initial_weight 7.2701 at location 
>> {host=node24,root=default}
>> 2016-10-07 19:39:30.515494 7fd39aced700  0 log_channel(audit) log [INF] : 
>> from='client.6554587 :/0' entity='osd.0' cmd=[{"prefix": "osd crush 
>> create-or-move", "args": ["host=node24", "root=default"], "id": 0, "weight": 
>> 7.2701}]: dispatch
>> 2016-10-07 19:39:30.515618 7fd39aced700  0 mon.core@0(leader).osd e4363 
>> create-or-move crush item name 'osd.0' initial_weight 7.2701 at location 
>> {host=node24,root=default}
>> 2016-10-07 19:41:59.714517 7fd39b4ee700  0 log_channel(cluster) log [INF] : 
>> osd.0 out (down for 338.148761)
> 
> 
> Everything running latest Jewel release
> 
>> ceph --version
>> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> 
> Any help with this is extremely appreciated. Hoping someone has dealt with 
> this before.
> 
> Reed Dier

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to