To test things, I tried created a new mgr in case there was some weird corruption with the old key but I'm seeing the same behavior with the new mgr.
On Fri, Jan 4, 2019 at 11:03 AM Randall Smith <[email protected]> wrote: > The keys in the keyrings for the broken mgrs match what is shows in ceph > auth list. The relevant entries are below so that you can see the caps. > > I am having problems with both mgr.6 and mgr.8. mgr.7 is the only mgr > currently functioning. > > mgr.6 > key: [redacted] > caps: [mds] allow * > caps: [mgr] allow r > caps: [mon] allow profile mgr > caps: [osd] allow * > mgr.7 > key: [redacted] > caps: [mds] allow * > caps: [mgr] allow r > caps: [mon] allow profile mgr > caps: [osd] allow * > mgr.8 > key: [redacted] > caps: [mds] allow * > caps: [mon] allow profile mgr > caps: [osd] allow * > > I agree that an auth issue seems unlikely to have been triggered but I'm > not sure what else it can be. > > > On Fri, Jan 4, 2019 at 10:51 AM Steve Taylor < > [email protected]> wrote: > >> I can't think of why the upgrade would have broken your keys, but have >> you verified that the mons still have the correct mgr keys configured? >> 'ceph auth ls' should list an mgr.<host> key for each mgr with a key >> matching the contents of /var/lib/ceph/mgr/<cluster>-<host>/keyring on the >> mgr host and some caps that should minimally include '[mon] allow profile >> mgr' and '[osd] allow *' I would think. >> >> Again, it seems unlikely that this would have broken with the upgrade if >> it had been working previously, but if you're seeing auth errors it might >> be something to check out. >> >> ------------------------------ >> >> *Steve Taylor* | Senior Software Engineer | *StorageCraft Technology >> Corporation* <https://storagecraft.com> >> 380 Data Drive Suite 300 | Draper | Utah | 84020 >> *Office:* 801.871.2799 | >> ------------------------------ >> If you are not the intended recipient of this message or received it >> erroneously, please notify the sender and delete it, together with any >> attachments, and be advised that any dissemination or copying of this >> message is prohibited. >> ------------------------------ >> >> On Fri, 2019-01-04 at 07:26 -0700, Randall Smith wrote: >> >> Greetings, >> >> I'm upgrading my cluster from luminous to mimic. I've upgraded my >> monitors and am attempting to upgrade the mgrs. Unfortunately, after an >> upgrade the mgr daemon exits immediately with error code 1. >> >> I've tried running ceph-mgr in debug mode to try to see what's happening >> but the output (below) is a bit cryptic for me. It looks like >> authentication might be failing but it was working prior to the upgrade. >> >> I do have "auth supported = cephx" in the global section of ceph.conf. >> >> What do I need to do to fix this? >> >> Thanks. >> >> /usr/bin/ceph-mgr -f --cluster ceph --id 8 --setuser ceph --setgroup ceph >> -d --debug_ms 5 >> >> 2019-01-04 07:01:38.457 7f808f83f700 2 Event(0x30c42c0 nevent=5000 >> time_id=1).set_owner idx=0 owner=140190140331776 >> >> 2019-01-04 07:01:38.457 7f808f03e700 2 Event(0x30c4500 nevent=5000 >> time_id=1).set_owner idx=1 owner=140190131939072 >> >> 2019-01-04 07:01:38.457 7f808e83d700 2 Event(0x30c4e00 nevent=5000 >> time_id=1).set_owner idx=2 owner=140190123546368 >> >> 2019-01-04 07:01:38.457 7f809dd5b380 1 Processor -- start >> >> >> 2019-01-04 07:01:38.477 7f809dd5b380 1 -- - start start >> >> >> 2019-01-04 07:01:38.481 7f809dd5b380 1 -- - --> 192.168.253.147:6789/0 >> -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6780 con 0 >> >> 2019-01-04 07:01:38.481 7f809dd5b380 1 -- - --> 192.168.253.148:6789/0 >> -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6a00 con 0 >> 2019-01-04 07:01:38.481 7f808e83d700 1 -- 192.168.253.148:0/1359135487 >> learned_addr learned my addr 192.168.253.148:0/1359135487 >> 2019-01-04 07:01:38.481 7f808e83d700 2 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got >> newly_a$ >> ked_seq 0 vs out_seq 0 >> 2019-01-04 07:01:38.481 7f808f03e700 2 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got >> newly_a$ >> ked_seq 0 vs out_seq 0 >> 2019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 >> seq >> 1 0x30c5440 mon_map magic: 0 v1 >> 2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 >> seq >> 1 0x30c5680 mon_map magic: 0 v1 >> 2019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 >> seq >> 2 0x32a6780 auth_reply(proto 2 0 (0) Success) v1 >> 2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 >> seq >> 2 0x32a6a00 auth_reply(proto 2 0 (0) Success) v1 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> <== mon.1 192.168.253.147:6789/0 1 ==== mon_map magic: 0 v1 ==== 370+0+0 >> (3034216899 0 0) 0x30c5440 con 0x332ce00 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> <== mon.2 192.168.253.148:6789/0 1 ==== mon_map magic: 0 v1 ==== 370+0+0 >> (3034216899 0 0) 0x30c5680 con 0x332d500 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> <== mon.1 192.168.253.147:6789/0 2 ==== auth_reply(proto 2 0 (0) >> Success) v1 ==== 33+0+0 (3430158761 0 0) 0x32a6780 con 0x33$ >> ce00 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> --> 192.168.253.147:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- >> 0x32a6f00 con 0 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> <== mon.2 192.168.253.148:6789/0 2 ==== auth_reply(proto 2 0 (0) >> Success) v1 ==== 33+0+0 (3242503871 0 0) 0x32a6a00 con 0x33$ >> d500 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> --> 192.168.253.148:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- >> 0x32a6780 con 0 >> 2019-01-04 07:01:38.481 7f808f03e700 5 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1 >> seq >> 3 0x32a6f00 auth_reply(proto 2 -22 (22) Invalid argument) v1 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> <== mon.1 192.168.253.147:6789/0 3 ==== auth_reply(proto 2 -22 (22) >> Invalid argument) v1 ==== 24+0+0 (882932531 0 0) 0x32a6f$ >> 0 con 0x332ce00 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 cs=1 >> l=1).mark_down >> 2019-01-04 07:01:38.481 7f808e03c700 2 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 cs=1 >> l=1)._stop >> 2019-01-04 07:01:38.481 7f808e83d700 5 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 >> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2 >> seq >> 3 0x32a6780 auth_reply(proto 2 -22 (22) Invalid argument) v1 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> <== mon.2 192.168.253.148:6789/0 3 ==== auth_reply(proto 2 -22 (22) >> Invalid argument) v1 ==== 24+0+0 (1359424806 0 0) 0x32a6$ >> 80 con 0x332d500 >> 2019-01-04 07:01:38.481 7f808e03c700 1 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 cs=1 >> l=1).mark_down >> 2019-01-04 07:01:38.481 7f808e03c700 2 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 cs=1 >> l=1)._stop >> >> 2019-01-04 07:01:38.481 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >> shutdown_connections >> 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >> shutdown_connections mark down 192.168.253.148:6789/0 0x332d500 >> 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >> shutdown_connections mark down 192.168.253.147:6789/0 0x332ce00 >> 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >> shutdown_connections delete 0x332ce00 >> 2019-01-04 07:01:38.481 7f809dd5b380 5 -- 192.168.253.148:0/1359135487 >> shutdown_connections delete 0x332d500 >> 2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >> shutdown_connections >> 2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >> wait complete. >> 2019-01-04 07:01:38.485 7f809dd5b380 1 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0 >> cs=0 l=0).mark_down >> 2019-01-04 07:01:38.485 7f809dd5b380 2 -- 192.168.253.148:0/1359135487 >> >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0 >> cs=0 l=0)._stop >> failed to fetch mon config (--no-mon-config to skip) >> >> _______________________________________________ >> >> ceph-users mailing list >> >> [email protected] >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> > > -- > Randall Smith > Computing Services > Adams State University > http://www.adams.edu/ > 719-587-7741 > -- Randall Smith Computing Services Adams State University http://www.adams.edu/ 719-587-7741
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
