To test things, I tried created a new mgr in case there was some weird
corruption with the old key but I'm seeing the same behavior with the new
mgr.

On Fri, Jan 4, 2019 at 11:03 AM Randall Smith <[email protected]> wrote:

> The keys in the keyrings for the broken mgrs match what is shows in ceph
> auth list. The relevant entries are below so that you can see the caps.
>
> I am having problems with both mgr.6 and mgr.8. mgr.7 is the only mgr
> currently functioning.
>
> mgr.6
>         key: [redacted]
>         caps: [mds] allow *
>         caps: [mgr] allow r
>         caps: [mon] allow profile mgr
>         caps: [osd] allow *
> mgr.7
>         key: [redacted]
>         caps: [mds] allow *
>         caps: [mgr] allow r
>         caps: [mon] allow profile mgr
>         caps: [osd] allow *
> mgr.8
>         key: [redacted]
>         caps: [mds] allow *
>         caps: [mon] allow profile mgr
>         caps: [osd] allow *
>
> I agree that an auth issue seems unlikely to have been triggered but I'm
> not sure what else it can be.
>
>
> On Fri, Jan 4, 2019 at 10:51 AM Steve Taylor <
> [email protected]> wrote:
>
>> I can't think of why the upgrade would have broken your keys, but have
>> you verified that the mons still have the correct mgr keys configured?
>> 'ceph auth ls' should list an mgr.<host> key for each mgr with a key
>> matching the contents of /var/lib/ceph/mgr/<cluster>-<host>/keyring on the
>> mgr host and some caps that should minimally include '[mon] allow profile
>> mgr' and '[osd] allow *' I would think.
>>
>> Again, it seems unlikely that this would have broken with the upgrade if
>> it had been working previously, but if you're seeing auth errors it might
>> be something to check out.
>>
>> ------------------------------
>>
>> *Steve Taylor* | Senior Software Engineer | *StorageCraft Technology
>> Corporation* <https://storagecraft.com>
>> 380 Data Drive Suite 300 | Draper | Utah | 84020
>> *Office:* 801.871.2799 |
>> ------------------------------
>> If you are not the intended recipient of this message or received it
>> erroneously, please notify the sender and delete it, together with any
>> attachments, and be advised that any dissemination or copying of this
>> message is prohibited.
>> ------------------------------
>>
>> On Fri, 2019-01-04 at 07:26 -0700, Randall Smith wrote:
>>
>> Greetings,
>>
>> I'm upgrading my cluster from luminous to mimic. I've upgraded my
>> monitors and am attempting to upgrade the mgrs. Unfortunately, after an
>> upgrade the mgr daemon exits immediately with error code 1.
>>
>> I've tried running ceph-mgr in debug mode to try to see what's happening
>> but the output (below) is a bit cryptic for me. It looks like
>> authentication might be failing but it was working prior to the upgrade.
>>
>> I do have "auth supported = cephx" in the global section of ceph.conf.
>>
>> What do I need to do to fix this?
>>
>> Thanks.
>>
>> /usr/bin/ceph-mgr -f --cluster ceph --id 8 --setuser ceph --setgroup ceph
>> -d --debug_ms 5
>>
>> 2019-01-04 07:01:38.457 7f808f83f700  2 Event(0x30c42c0 nevent=5000
>> time_id=1).set_owner idx=0 owner=140190140331776
>>
>> 2019-01-04 07:01:38.457 7f808f03e700  2 Event(0x30c4500 nevent=5000
>> time_id=1).set_owner idx=1 owner=140190131939072
>>
>> 2019-01-04 07:01:38.457 7f808e83d700  2 Event(0x30c4e00 nevent=5000
>> time_id=1).set_owner idx=2 owner=140190123546368
>>
>> 2019-01-04 07:01:38.457 7f809dd5b380  1  Processor -- start
>>
>>
>> 2019-01-04 07:01:38.477 7f809dd5b380  1 -- - start start
>>
>>
>> 2019-01-04 07:01:38.481 7f809dd5b380  1 -- - --> 192.168.253.147:6789/0
>> -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6780 con 0
>>
>> 2019-01-04 07:01:38.481 7f809dd5b380  1 -- - --> 192.168.253.148:6789/0
>> -- auth(proto 0 26 bytes epoch 0) v1 -- 0x32a6a00 con 0
>> 2019-01-04 07:01:38.481 7f808e83d700  1 -- 192.168.253.148:0/1359135487
>> learned_addr learned my addr 192.168.253.148:0/1359135487
>> 2019-01-04 07:01:38.481 7f808e83d700  2 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:6789/0 conn(0x332d500 :-1
>> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got
>> newly_a$
>> ked_seq 0 vs out_seq 0
>> 2019-01-04 07:01:38.481 7f808f03e700  2 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1
>> s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got
>> newly_a$
>> ked_seq 0 vs out_seq 0
>> 2019-01-04 07:01:38.481 7f808f03e700  5 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1
>> seq
>> 1 0x30c5440 mon_map magic: 0 v1
>> 2019-01-04 07:01:38.481 7f808e83d700  5 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:6789/0 conn(0x332d500 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2
>> seq
>> 1 0x30c5680 mon_map magic: 0 v1
>> 2019-01-04 07:01:38.481 7f808f03e700  5 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1
>> seq
>> 2 0x32a6780 auth_reply(proto 2 0 (0) Success) v1
>> 2019-01-04 07:01:38.481 7f808e83d700  5 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:6789/0 conn(0x332d500 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2
>> seq
>> 2 0x32a6a00 auth_reply(proto 2 0 (0) Success) v1
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> <== mon.1 192.168.253.147:6789/0 1 ==== mon_map magic: 0 v1 ==== 370+0+0
>> (3034216899 0 0) 0x30c5440 con 0x332ce00
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> <== mon.2 192.168.253.148:6789/0 1 ==== mon_map magic: 0 v1 ==== 370+0+0
>> (3034216899 0 0) 0x30c5680 con 0x332d500
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> <== mon.1 192.168.253.147:6789/0 2 ==== auth_reply(proto 2 0 (0)
>> Success) v1 ==== 33+0+0 (3430158761 0 0) 0x32a6780 con 0x33$
>> ce00
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> --> 192.168.253.147:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 --
>> 0x32a6f00 con 0
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> <== mon.2 192.168.253.148:6789/0 2 ==== auth_reply(proto 2 0 (0)
>> Success) v1 ==== 33+0+0 (3242503871 0 0) 0x32a6a00 con 0x33$
>> d500
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> --> 192.168.253.148:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 --
>> 0x32a6780 con 0
>> 2019-01-04 07:01:38.481 7f808f03e700  5 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74172 cs=1 l=1). rx mon.1
>> seq
>> 3 0x32a6f00 auth_reply(proto 2 -22 (22) Invalid argument) v1
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> <== mon.1 192.168.253.147:6789/0 3 ==== auth_reply(proto 2 -22 (22)
>> Invalid argument) v1 ==== 24+0+0 (882932531 0 0) 0x32a6f$
>> 0 con 0x332ce00
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 cs=1
>> l=1).mark_down
>> 2019-01-04 07:01:38.481 7f808e03c700  2 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.147:6789/0 conn(0x332ce00 :-1 s=STATE_OPEN pgs=74172 cs=1
>> l=1)._stop
>> 2019-01-04 07:01:38.481 7f808e83d700  5 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:6789/0 conn(0x332d500 :-1
>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=74275 cs=1 l=1). rx mon.2
>> seq
>> 3 0x32a6780 auth_reply(proto 2 -22 (22) Invalid argument) v1
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> <== mon.2 192.168.253.148:6789/0 3 ==== auth_reply(proto 2 -22 (22)
>> Invalid argument) v1 ==== 24+0+0 (1359424806 0 0) 0x32a6$
>> 80 con 0x332d500
>> 2019-01-04 07:01:38.481 7f808e03c700  1 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 cs=1
>> l=1).mark_down
>> 2019-01-04 07:01:38.481 7f808e03c700  2 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:6789/0 conn(0x332d500 :-1 s=STATE_OPEN pgs=74275 cs=1
>> l=1)._stop
>>
>> 2019-01-04 07:01:38.481 7f809dd5b380  1 -- 192.168.253.148:0/1359135487
>> shutdown_connections
>> 2019-01-04 07:01:38.481 7f809dd5b380  5 -- 192.168.253.148:0/1359135487
>> shutdown_connections mark down 192.168.253.148:6789/0 0x332d500
>> 2019-01-04 07:01:38.481 7f809dd5b380  5 -- 192.168.253.148:0/1359135487
>> shutdown_connections mark down 192.168.253.147:6789/0 0x332ce00
>> 2019-01-04 07:01:38.481 7f809dd5b380  5 -- 192.168.253.148:0/1359135487
>> shutdown_connections delete 0x332ce00
>> 2019-01-04 07:01:38.481 7f809dd5b380  5 -- 192.168.253.148:0/1359135487
>> shutdown_connections delete 0x332d500
>> 2019-01-04 07:01:38.485 7f809dd5b380  1 -- 192.168.253.148:0/1359135487
>> shutdown_connections
>> 2019-01-04 07:01:38.485 7f809dd5b380  1 -- 192.168.253.148:0/1359135487
>> wait complete.
>> 2019-01-04 07:01:38.485 7f809dd5b380  1 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0
>> cs=0 l=0).mark_down
>> 2019-01-04 07:01:38.485 7f809dd5b380  2 -- 192.168.253.148:0/1359135487
>> >> 192.168.253.148:0/1359135487 conn(0x332c000 :-1 s=STATE_NONE pgs=0
>> cs=0 l=0)._stop
>> failed to fetch mon config (--no-mon-config to skip)
>>
>> _______________________________________________
>>
>> ceph-users mailing list
>>
>> [email protected]
>>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
> --
> Randall Smith
> Computing Services
> Adams State University
> http://www.adams.edu/
> 719-587-7741
>


-- 
Randall Smith
Computing Services
Adams State University
http://www.adams.edu/
719-587-7741
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to