Are the clocks dramatically out of sync? Basically any bug in signing could cause that kind of log message, but u think simple time sync so they're using different keys is the most common. On Mon, Jul 24, 2017 at 9:36 AM <[email protected]> wrote:
> Hi, > > I'm running a Ceph cluster which I started back in bobtail age and kept it > running/upgrading over the years. It has three nodes, each running one MON, > 10 OSDs and one MDS. The cluster has one MDS active and two standby. > Machines are 8-core Opterons with 32GB of ECC RAM each. I'm using it to > host our clients (about 25) /home using CephFS and as a RBD Backend for a > couple of libvirt VMs (about 5). > > Currently I'm running 11.2.0 (kraken) and a couple of month ago I started > experiencing some strange behaviour. Exactly 2 of my ~25 CephFS Clients > (always the same two) keep freezing their /home about 1 or two hours after > first boot in the morning. At the moment of freeze, syslog starts reporting > loads of: > > _hostname_ kernel: libceph: osdXX 172.16.0.XXX:68XX bad authorize reply > > On one of the clients I replaced every single piece of hardware with new > hardware, so that machine is completely replaced now including NIC, Switch, > Network-Cabling and did a complete OS reinstall. But the user is still > getting that behaviour. As far as I could get, it seems that key > renegotiation is failing and client tries to keep connecting with old cephx > key. But I cannot find a reason for why this is happening and how to fix it. > > Biggest problem, the second affected machine is the one of our CEO and if > we won't fix it I will have a hard time explaining that Ceph is the way to > go. > > The two affected machines do not share any common piece of network segment > other than TOR-Switch in Ceph Rack, while there are other clients that do > share network segment with affected machines but arent affected at all. > > Google won't help me either on this one, seems no one else is experiencing > something similar. > > Client setup on all clients is Debian Jessie with 4.9 Backports kernel, > using kernel client for mounting CephFS. I think the whole thing started > with a kernel upgrade from one 4.X series to another, but cannout > reconstruct. > > Any help greatly appreciated. > > Best regards, > Tobi > > > _______________________________________________ > ceph-users mailing list > [email protected] > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list [email protected] http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
