ceph-post-file: 789769c4-e7e4-47d3-8fb7-475ea4cfe14a This should have the information you need.
On Wed, Apr 3, 2019 at 5:49 PM Sage Weil <s...@newdream.net> wrote: > This OSD also appears on teh accepting end of things, and probably > has newer keys that the OSD connecting (tho it' shard to tell > because teh debug level isn't turned up). > > The goal is to find an osd still throwing BADAUTHORIZER messages and then > turn of debug_auth=20 and debug_monc=20. Specifically I'm looking for the > output from MonClient::_check_auth_rotating(), which should tell us > whether the OSD thinks its rotating keys are up to date, or whether it is > having trouble renewing them, or what. It's weird that some OSDs are > using hours-old keys to try to authenticate. :/ > > sage > > > On Wed, 3 Apr 2019, Shawn Edwards wrote: > > > ceph-post-file: 60e07a0c-ee5b-4174-9f51-fa091d5662dc > > > > On Wed, Apr 3, 2019 at 5:30 PM Shawn Edwards <lesser.e...@gmail.com> > wrote: > > > > > According to ceph versions, all bits are running 14.2.0 > > > > > > I have restarted all of the OSD at least twice and am still getting the > > > same error. > > > > > > I'll send a log file with confirmed interesting bad behavior shortly > > > > > > On Wed, Apr 3, 2019, 17:17 Sage Weil <s...@newdream.net> wrote: > > > > > >> 2019-04-03 15:04:01.986 7ffae5778700 10 --1- v1: > 10.36.9.46:6813/5003637 > > >> >> v1:10.36.9.28:6809/8224 conn(0xf6a6000 0x30a02000 :6813 > > >> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > l=0).handle_connect_message_2 > > >> authorizor_protocol 2 len 174 > > >> 2019-04-03 15:04:01.986 7ffae5778700 20 AuthRegistry(0xcd64a40) > > >> get_handler peer_type 4 method 2 cluster_methods [2] service_methods > [2] > > >> client_methods [2] > > >> 2019-04-03 15:04:01.986 7ffae5778700 10 cephx: verify_authorizer > > >> decrypted service osd secret_id=41686 > > >> 2019-04-03 15:04:01.986 7ffae5778700 0 auth: could not find > > >> secret_id=41686 > > >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth: dump_rotating: > > >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth: id 41691 ... expires > > >> 2019-04-03 14:43:07.042860 > > >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth: id 41692 ... expires > > >> 2019-04-03 15:43:09.895511 > > >> 2019-04-03 15:04:01.986 7ffae5778700 10 auth: id 41693 ... expires > > >> 2019-04-03 16:43:09.895511 > > >> 2019-04-03 15:04:01.986 7ffae5778700 0 cephx: verify_authorizer could > > >> not get service secret for service osd secret_id=41686 > > >> 2019-04-03 15:04:01.986 7ffae5778700 0 --1- v1: > 10.36.9.46:6813/5003637 > > >> >> v1:10.36.9.28:6809/8224 conn(0xf6a6000 0x30a02000 :6813 > > >> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 > l=0).handle_connect_message_2: > > >> got bad authorizer, auth_reply_len=0 > > >> > > >> For some reason this OSD has much newer rotating keys than the > > >> connecting OSD. But earlier in the day, this osd was the one > > >> getting BADAUTHORIZER, so maybe that shifted. Can you find an OSD > where > > >> you still see BADAUTHORIZER appearing in the log? > > >> > > >> My guess is that if you restart the OSDs, they'll get fresh rotating > keys > > >> and things will be fine. But that doesn't explain why they're not > > >> renewing on their own right now.. that I'm not so sure about. > > >> > > >> Are your mons all running nautilus? Does 'ceph versions' show > everything > > >> has upgraded? > > >> > > >> sage > > >> > > >> > > >> On Wed, 3 Apr 2019, Shawn Edwards wrote: > > >> > > >> > File uploaded: f1a2bfb3-92b4-495c-8706-f99cb228efc7 > > >> > > > >> > On Wed, Apr 3, 2019 at 4:57 PM Sage Weil <s...@newdream.net> wrote: > > >> > > > >> > > Hmm, that doesn't help. > > >> > > > > >> > > Can you set > > >> > > > > >> > > ceph config set osd debug_ms 20 > > >> > > ceph config set osd debug_auth 20 > > >> > > ceph config set osd debug_monc 20 > > >> > > > > >> > > for a few minutes and ceph-post-file the osd logs? (Or send a > private > > >> > > email with a link or something.) > > >> > > > > >> > > Thanks! > > >> > > sage > > >> > > > > >> > > > > >> > > On Wed, 3 Apr 2019, Shawn Edwards wrote: > > >> > > > > >> > > > No strange auth config: > > >> > > > > > >> > > > root@tyr-ceph-mon0:~# ceph config dump | grep -E '(auth|cephx)' > > >> > > > global advanced auth_client_required cephx > > >> > > > * > > >> > > > global advanced auth_cluster_required cephx > > >> > > > * > > >> > > > global advanced auth_service_required cephx > > >> > > > * > > >> > > > > > >> > > > All boxes are using 'minimal' ceph.conf files and centralized > > >> config. > > >> > > > > > >> > > > If you need the full config, it's here: > > >> > > > > https://gist.github.com/lesserevil/3b82d37e517f4561ce53c81629717aae > > >> > > > > > >> > > > On Wed, Apr 3, 2019 at 4:07 PM Sage Weil <s...@newdream.net> > wrote: > > >> > > > > > >> > > > > On Wed, 3 Apr 2019, Shawn Edwards wrote: > > >> > > > > > Recent nautilus upgrade from mimic. No issues on mimic. > > >> > > > > > > > >> > > > > > Now getting this or similar in all osd logs, there is very > > >> little osd > > >> > > > > > communicatoin, and most of the PG are either 'down' or > > >> 'unknown', > > >> > > even > > >> > > > > > though I can see the data on the filestores. > > >> > > > > > > > >> > > > > > 2019-04-03 13:47:55.280 7f13346e3700 0 --1- [v2: > > >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1: > > >> > > 10.36.9.37:6821/8825 > > >> > > > > > conn(0xa7132000 0xa6b28000 :-1 s=CONNECTING_SEND_CONNECT_MSG > > >> pgs=0 > > >> > > cs=0 > > >> > > > > > l=0).handle_connect_reply_2 connect got BADAUTHORIZER > > >> > > > > > 2019-04-03 13:47:55.296 7f1333ee2700 0 --1- [v2: > > >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1: > > >> > > > > 10.36.9.37:6841/11204 > > >> > > > > > conn(0xa9826d00 0xa9b78000 :-1 s=CONNECTING_SEND_CONNECT_MSG > > >> pgs=0 > > >> > > cs=0 > > >> > > > > > l=0).handle_connect_reply_2 connect got BADAUTHORIZER > > >> > > > > > 2019-04-03 13:47:55.340 7f13346e3700 0 --1- [v2: > > >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1: > > >> > > 10.36.9.37:6829/8425 > > >> > > > > > conn(0xa7997180 0xaeb22800 :-1 s=CONNECTING_SEND_CONNECT_MSG > > >> pgs=0 > > >> > > cs=0 > > >> > > > > > l=0).handle_connect_reply_2 connect got BADAUTHORIZER > > >> > > > > > 2019-04-03 13:47:55.428 7f1334ee4700 0 auth: could not find > > >> > > > > secret_id=41687 > > >> > > > > > 2019-04-03 13:47:55.428 7f1334ee4700 0 cephx: > verify_authorizer > > >> > > could > > >> > > > > not > > >> > > > > > get service secret for service osd secret_id=41687 > > >> > > > > > 2019-04-03 13:47:55.428 7f1334ee4700 0 --1- [v2: > > >> > > > > > 10.36.9.26:6802/3107,v1:10.36.9.26:6803/3107] >> v1: > > >> > > > > 10.36.9.48:6805/49547 > > >> > > > > > conn(0xe02f24480 0xe088cb800 :6803 > > >> s=ACCEPTING_WAIT_CONNECT_MSG_AUTH > > >> > > > > pgs=0 > > >> > > > > > cs=0 l=0).handle_connect_message_2: got bad authorizer, > > >> > > auth_reply_len=0 > > >> > > > > > > > >> > > > > > Thoughts? I have confirmed that all ceph boxes have good > time > > >> sync. > > >> > > > > > > >> > > > > Do you have any non-default auth-related settings in > ceph.conf? > > >> > > > > > > >> > > > > sage > > >> > > > > > > >> > > > > > >> > > > > > >> > > > -- > > >> > > > Shawn Edwards > > >> > > > Beware programmers with screwdrivers. They tend to spill them > on > > >> their > > >> > > > keyboards. > > >> > > > > > >> > > > > >> > > > >> > > > >> > -- > > >> > Shawn Edwards > > >> > Beware programmers with screwdrivers. They tend to spill them on > their > > >> > keyboards. > > >> > > > >> > > > > > > > -- > > Shawn Edwards > > Beware programmers with screwdrivers. They tend to spill them on their > > keyboards. > > > -- Shawn Edwards Beware programmers with screwdrivers. They tend to spill them on their keyboards.
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com