Hi all,
I have started to dig into this again, too much work stopped me from digging
further and I had the machines shut down since, anyway. My problem started when
I upgraded to Infernalis from Hammer (I am quite certain it was at least
0.94.4) then my OSDs can´t join any more. I am running on Ubuntu 14.04. I guess
rolling back is not an option?
My setup is 3 HP microservers named black, orange and purple. All 3 servers
have one mon each and four OSDs.
It has been suggested to download a version released by Sage that might fix my
problem, but I am not sure where the archives are.
Looking at the mon log I see:
2015-12-05 19:08:49.895681 7f68852a9700 10 mon.black@0(leader).pg v8253363
check_osd_map -- osdmap not readable, waiting
Also this might give a clue to what is happening, when I try to change the
logging for the OSD I get an error:
ceph tell osd.0 injectargs '--debug-osd 0/5'
Error ENXIO: problem getting command descriptions from osd.0
More from the logs, I marked osd.0 as in at Dec 5 19:08:49 CET 2015
I hope my data is still ok, I am not in a hurry to get it back and prefer a
safe solution to a quick one :)
I am very thankful for any help or suggestions. Since all this have just worked
for way over a year then I am quite rusty when it comes to figuring out what is
wrong and my Linux skills are not enough to figure out what has gone wrong.
I tried to only keep what it in the logs from when I tried to set the osd.0 to
in.
In the log for the OSD when I start it and try to get it to join I get:
2015-12-05 19:08:18.319849 7feef41b5940 0 ceph version 9.2.0
(bb2ecea240f3a1d525bcb35670cb07bd1f0ca299), process ceph-osd, pid 4249
2015-12-05 19:08:18.414650 7feef41b5940 0 filestore(/ceph/osd.0) backend xfs
(magic 0x58465342)
2015-12-05 19:08:18.415996 7feef41b5940 0 genericfilestorebackend(/ceph/osd.0)
detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option
2015-12-05 19:08:18.416033 7feef41b5940 0 genericfilestorebackend(/ceph/osd.0)
detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole'
config option
2015-12-05 19:08:18.416090 7feef41b5940 0 genericfilestorebackend(/ceph/osd.0)
detect_features: splice is supported
2015-12-05 19:08:18.418819 7feef41b5940 0 genericfilestorebackend(/ceph/osd.0)
detect_features: syncfs(2) syscall fully supported (by glibc and kernel)
2015-12-05 19:08:18.419136 7feef41b5940 0 xfsfilestorebackend(/ceph/osd.0)
detect_features: extsize is supported and your kernel >= 3.5
2015-12-05 19:08:18.527107 7feef41b5940 0 filestore(/ceph/osd.0) mount:
enabling WRITEAHEAD journal mode: checkpoint is not enabled
2015-12-05 19:08:18.637889 7feef41b5940 1 journal _open
/dev/black/journal-osd.0 fd 19: 23622320128 bytes, block size 4096 bytes,
directio = 1, aio = 1
2015-12-05 19:08:18.667396 7feef41b5940 1 journal _open
/dev/black/journal-osd.0 fd 19: 23622320128 bytes, block size 4096 bytes,
directio = 1, aio = 1
2015-12-05 19:08:18.720872 7feef41b5940 1 filestore(/ceph/osd.0) upgrade
2015-12-05 19:08:18.742447 7feef41b5940 0 <cls> cls/cephfs/cls_cephfs.cc:136:
loading cephfs_size_scan
2015-12-05 19:08:18.744275 7feef41b5940 0 <cls> cls/hello/cls_hello.cc:305:
loading cls_hello
2015-12-05 19:08:18.761847 7feef41b5940 0 osd.0 39530 crush map has features
1107558400, adjusting msgr requires for clients
2015-12-05 19:08:18.761883 7feef41b5940 0 osd.0 39530 crush map has features
1107558400 was 8705, adjusting msgr requires for mons
2015-12-05 19:08:18.761899 7feef41b5940 0 osd.0 39530 crush map has features
1107558400, adjusting msgr requires for osds
2015-12-05 19:08:21.301437 7feef41b5940 0 osd.0 39530 load_pgs
2015-12-05 19:09:04.080813 7feef41b5940 0 osd.0 39530 load_pgs opened 1230 pgs
2015-12-05 19:09:04.118602 7feef41b5940 -1 osd.0 39530 log_to_monitors
{default=true}
2015-12-05 19:09:04.135379 7feed5f0e700 0 osd.0 39530 ignoring osdmap until we
have initialized
2015-12-05 19:09:05.867889 7feef41b5940 0 osd.0 39530 done with init, starting
boot process
In the logs of the mons I get:
2015-12-05 19:08:49.635074 7f688262e700 10 mon.black@0(leader).log v9157752
logging 2015-12-05 19:08:49.633694 mon.2 172.16.0.203:6789/0 27 : audit [INF]
from='client.? 172.16.0.201:0/3894299556' entity='client.admin' cmd=[{"prefix":
"osd in", "ids": ["0"]}]: dispatch
2015-12-05 19:08:49.635184 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0
2015-12-05 19:08:49.635188 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.635196 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.635198 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.635199 7f688262e700 20 allow all
2015-12-05 19:08:49.635201 7f688262e700 10 mon.black@0(leader) e3 received
forwarded message from client.524843 172.16.0.201:0/3894299556 via mon.2
172.16.0.203:6789/0
2015-12-05 19:08:49.635207 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.635209 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.635210 7f688262e700 20 allow all
2015-12-05 19:08:49.635214 7f688262e700 10 mon.black@0(leader) e3 caps are
allow *
2015-12-05 19:08:49.635216 7f688262e700 10 mon.black@0(leader) e3 entity name
'client.admin' type 8
2015-12-05 19:08:49.635218 7f688262e700 10 mon.black@0(leader) e3 mesg
0x7f688fb10800 from 172.16.0.203:6789/0
2015-12-05 19:08:49.635240 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f21b480 for client.524843 :/0
2015-12-05 19:08:49.635243 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.635301 7f688262e700 0 mon.black@0(leader) e3
handle_command mon_command({"prefix": "osd in", "ids": ["0"]} v 0) v1
2015-12-05 19:08:49.635337 7f688262e700 20 is_capable service=osd command=osd
in read write on cap allow *
2015-12-05 19:08:49.635340 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.635341 7f688262e700 20 allow all
2015-12-05 19:08:49.635342 7f688262e700 10 mon.black@0(leader) e3
_allowed_command capable
2015-12-05 19:08:49.635353 7f688262e700 0 log_channel(audit) log [INF] :
from='client.524843 :/0' entity='client.admin' cmd=[{"prefix": "osd in", "ids":
["0"]}]: dispatch
2015-12-05 19:08:49.635453 7f688262e700 10 mon.black@0(leader).osd e39530
preprocess_query mon_command({"prefix": "osd in", "ids": ["0"]} v 0) v1 from
client.524843 172.16.0.201:0/3894299556
2015-12-05 19:08:49.635502 7f688262e700 7 mon.black@0(leader).osd e39530
prepare_update mon_command({"prefix": "osd in", "ids": ["0"]} v 0) v1 from
client.524843 172.16.0.201:0/3894299556
2015-12-05 19:08:49.637872 7f688262e700 10 mon.black@0(leader).osd e39530
should_propose
2015-12-05 19:08:49.637997 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f218a80 for mon.0 172.16.0.201:6789/0
2015-12-05 19:08:49.638004 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.638034 7f688262e700 10 mon.black@0(leader).log v9157752
preprocess_query log(1 entries) v1 from mon.0 172.16.0.201:6789/0
2015-12-05 19:08:49.638060 7f688262e700 10 mon.black@0(leader).log v9157752
preprocess_log log(1 entries) v1 from mon.0
2015-12-05 19:08:49.638064 7f688262e700 20 is_capable service=log command=
write on cap allow *
2015-12-05 19:08:49.638067 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.638069 7f688262e700 20 allow all
2015-12-05 19:08:49.638089 7f688262e700 10 mon.black@0(leader).log v9157752
prepare_update log(1 entries) v1 from mon.0 172.16.0.201:6789/0
2015-12-05 19:08:49.638100 7f688262e700 10 mon.black@0(leader).log v9157752
prepare_log log(1 entries) v1 from mon.0
2015-12-05 19:08:49.638103 7f688262e700 10 mon.black@0(leader).log v9157752
logging 2015-12-05 19:08:49.635355 mon.0 172.16.0.201:6789/0 29 : audit [INF]
from='client.524843 :/0' entity='client.admin' cmd=[{"prefix": "osd in", "ids":
["0"]}]: dispatch
2015-12-05 19:08:49.685271 7f6882e2f700 10 mon.black@0(leader).log v9157752
encode_full log v 9157752
2015-12-05 19:08:49.685505 7f6882e2f700 10 mon.black@0(leader).log v9157752
encode_pending v9157753
2015-12-05 19:08:49.690050 7f6882e2f700 10 mon.black@0(leader).osd e39530
encode_pending e 39531
2015-12-05 19:08:49.690092 7f6882e2f700 2 mon.black@0(leader).osd e39530
osd.0 IN
2015-12-05 19:08:49.690698 7f6882e2f700 20 mon.black@0(leader).osd e39530
full_crc 3915856006 inc_crc 3792224963
2015-12-05 19:08:49.691048 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f2188c0 for mon.1 172.16.0.202:6789/0
2015-12-05 19:08:49.691055 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.691064 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.691068 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.691069 7f688262e700 20 allow all
2015-12-05 19:08:49.691071 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.691073 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.691074 7f688262e700 20 allow all
2015-12-05 19:08:49.878891 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0
2015-12-05 19:08:49.878904 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.878921 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.878926 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.878929 7f688262e700 20 allow all
2015-12-05 19:08:49.878932 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.878935 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.878937 7f688262e700 20 allow all
2015-12-05 19:08:49.881064 7f68852a9700 10 mon.black@0(leader) e3
refresh_from_paxos
2015-12-05 19:08:49.881688 7f68852a9700 10 mon.black@0(leader).log v9157753
update_from_paxos
2015-12-05 19:08:49.881696 7f68852a9700 10 mon.black@0(leader).log v9157753
update_from_paxos version 9157753 summary v 9157752
2015-12-05 19:08:49.881747 7f68852a9700 10 mon.black@0(leader).log v9157753
update_from_paxos latest full 9157752
2015-12-05 19:08:49.881809 7f68852a9700 7 mon.black@0(leader).log v9157753
update_from_paxos applying incremental log 9157753 2015-12-05 19:08:49.633694
mon.2 172.16.0.203:6789/0 27 : audit [INF] from='client.?
172.16.0.201:0/3894299556' entity='client.admin' cmd=[{"prefix": "osd in",
"ids": ["0"]}]: dispatch
2015-12-05 19:08:49.881840 7f68852a9700 20 mon.black@0(leader).log v9157753
update_from_paxos logging for channel 'audit' to file
'/var/log/ceph/ceph.audit.log'
2015-12-05 19:08:49.881882 7f68852a9700 7 mon.black@0(leader).log v9157753
update_from_paxos applying incremental log 9157753 2015-12-05 19:08:49.635355
mon.0 172.16.0.201:6789/0 29 : audit [INF] from='client.524843 :/0'
entity='client.admin' cmd=[{"prefix": "osd in", "ids": ["0"]}]: dispatch
2015-12-05 19:08:49.881901 7f68852a9700 20 mon.black@0(leader).log v9157753
update_from_paxos logging for channel 'audit' to file
'/var/log/ceph/ceph.audit.log'
2015-12-05 19:08:49.881925 7f68852a9700 15 mon.black@0(leader).log v9157753
update_from_paxos logging for 1 channels
2015-12-05 19:08:49.881929 7f68852a9700 15 mon.black@0(leader).log v9157753
update_from_paxos channel 'audit' logging 353 bytes
2015-12-05 19:08:49.882845 7f68852a9700 10 mon.black@0(leader).log v9157753
check_subs
2015-12-05 19:08:49.883231 7f68852a9700 10 mon.black@0(leader).auth v14845
update_from_paxos
2015-12-05 19:08:49.883242 7f68852a9700 10 mon.black@0(leader).pg v8253363
map_pg_creates to 0 pgs -- no change
2015-12-05 19:08:49.883247 7f68852a9700 10 mon.black@0(leader).pg v8253363
send_pg_creates to 0 pgs
2015-12-05 19:08:49.883400 7f68852a9700 10 mon.black@0(leader).log v9157753
create_pending v 9157754
2015-12-05 19:08:49.883423 7f68852a9700 7 mon.black@0(leader).log v9157753
_updated_log for mon.2 172.16.0.203:6789/0
2015-12-05 19:08:49.883444 7f68852a9700 2 mon.black@0(leader) e3 send_reply
0x7f688f64ca80 0x7f688f3d5680 log(last 27) v1
2015-12-05 19:08:49.883450 7f68852a9700 15 mon.black@0(leader) e3 send_reply
routing reply to 172.16.0.203:6789/0 via 172.16.0.203:6789/0 for request log(1
entries) v1
2015-12-05 19:08:49.883531 7f68852a9700 7 mon.black@0(leader).log v9157753
_updated_log for mon.0 172.16.0.201:6789/0
2015-12-05 19:08:49.883548 7f68852a9700 2 mon.black@0(leader) e3 send_reply
0x7f688f64cb60 0x7f688f3d6940 log(last 29) v1
2015-12-05 19:08:49.886246 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f218a80 for mon.0 172.16.0.201:6789/0
2015-12-05 19:08:49.886261 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.886323 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.886330 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.886332 7f688262e700 20 allow all
2015-12-05 19:08:49.888142 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f2188c0 for mon.1 172.16.0.202:6789/0
2015-12-05 19:08:49.888346 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.888367 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.888373 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.888375 7f688262e700 20 allow all
2015-12-05 19:08:49.888379 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.888382 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.888384 7f688262e700 20 allow all
2015-12-05 19:08:49.888502 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0
2015-12-05 19:08:49.888512 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.888524 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.888528 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.888530 7f688262e700 20 allow all
2015-12-05 19:08:49.888532 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.888535 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.888537 7f688262e700 20 allow all
2015-12-05 19:08:49.889676 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f2188c0 for mon.1 172.16.0.202:6789/0
2015-12-05 19:08:49.889688 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.889703 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.889709 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.889711 7f688262e700 20 allow all
2015-12-05 19:08:49.889714 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.889717 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.889719 7f688262e700 20 allow all
2015-12-05 19:08:49.890685 7f688262e700 20 mon.black@0(leader) e3 _ms_dispatch
existing session 0x7f688f218c40 for mon.2 172.16.0.203:6789/0
2015-12-05 19:08:49.890697 7f688262e700 20 mon.black@0(leader) e3 caps allow *
2015-12-05 19:08:49.890712 7f688262e700 20 is_capable service=mon command= read
on cap allow *
2015-12-05 19:08:49.890717 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.890720 7f688262e700 20 allow all
2015-12-05 19:08:49.890747 7f688262e700 20 is_capable service=mon command= exec
on cap allow *
2015-12-05 19:08:49.890751 7f688262e700 20 allow so far , doing grant allow *
2015-12-05 19:08:49.890753 7f688262e700 20 allow all
2015-12-05 19:08:49.894322 7f68852a9700 10 mon.black@0(leader) e3
refresh_from_paxos
2015-12-05 19:08:49.894834 7f68852a9700 15 mon.black@0(leader).osd e39530
update_from_paxos paxos e 39531, my e 39530
2015-12-05 19:08:49.894899 7f68852a9700 7 mon.black@0(leader).osd e39530
update_from_paxos applying incremental 39531
2015-12-05 19:08:49.895218 7f68852a9700 1 mon.black@0(leader).osd e39531
e39531: 12 osds: 3 up, 4 in
2015-12-05 19:08:49.895663 7f68852a9700 10 mon.black@0(leader).osd e39531
adding osd.0 to down_pending_out map
2015-12-05 19:08:49.895681 7f68852a9700 10 mon.black@0(leader).pg v8253363
check_osd_map -- osdmap not readable, waiting
From: Claes Sahlström
Sent: den 16 november 2015 22:43
To: [email protected]
Subject: RE: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu
14.04
Yes I upgraded from Hammer 0.94.4.
And "ceph-osd --version" gives the correct version 9.2.0, I think it is a
problem with the communication between my OSDs and either the MONs or the other
OSDs or maybe both.
I will check out those archives from Sage also...
I have probably done something wrong, but I cannot figure out what is is. All
my upgrades before was smooth and this is quite an old installation I have at
home.
From: ceph-users [mailto:[email protected]] On Behalf Of Josef
Johansson
Sent: den 16 november 2015 22:18
To: David Clarke <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [ceph-users] OSD:s failing out after upgrade to 9.2.0 on Ubuntu
14.04
And if you look through the archives Sage did release a version of Infernalis
that fixed if you didn't do it that way as well.
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com