After getting all the OSDs and MONs updated and running ok, I updated the
MDS as usual; rebooted the machine after updating the kernel (we're on
14.04, but it was running an older 4.x kernel, so took it to 16.04's
version), the MDS fails to come up. No replay, no nothing.

It boots normally, and then stops while waiting for the journal to recover,
just repeating the broadcasts:

2016-04-30 21:21:33.889536 7f9f85da3700 10 mds.beacon.a _send up:replay seq
59
2016-04-30 21:21:33.889576 7f9f85da3700  1 -- 35.8.224.77:6800/31903 -->
35.8.224.132:6789/0 -- mdsbeacon(15227404/a up:replay seq 59 v6030) v6 --
?+0 0x55a7d0a72000 con 0x55a7d0934600
2016-04-30 21:21:33.890646 7f9f88eaa700  1 -- 35.8.224.77:6800/31903 <==
mon.1 35.8.224.132:6789/0 70 ==== mdsbeacon(15227404/a up:replay seq 59
v6030) v6 ==== 125+0+0 (945447566 0 0) 0x55a7d0a74700 con 0x55a7d0934600
2016-04-30 21:21:33.890693 7f9f88eaa700 10 mds.beacon.a handle_mds_beacon
up:replay seq 59 rtt 0.001135

Journal never does anything, but upon killing the pid, it shows:

2016-04-30 21:21:40.455902 7f9f83b9d700  4 mds.0.log Journal 300 recovered.
2016-04-30 21:21:40.455929 7f9f83b9d700  0 mds.0.log Journal 300 is in
unknown format 4294967295, does this MDS daemon require upgrade?

Only reason the MDS got rebooted fully after the upgrades was that some
random objects were showing unfound, yet if I shutdown one of the nodes
housing those OSDs, the unfound count would reduce. Obviously need to deal
with the MDS issue first haha.

Hopefully someone has some insight as what can be ran to either get it back
online as-was, nuke the journal (the metadata on-system should be ok, there
wasn't any traffic of importance happening during the upgrades), or reset
it so it'll pull from the metadata pool.

Thanks!

Russ
CAL Tech Lead

Russell Werner
werne...@msu.edu
O: 517.884.1504 - Direct
C: 517.803.8488 - 24/7
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to