okay this is third time when we have same outage... bastion2 and 3 were accessible for short time after bastion1's gluster died, then they died as well. public keys weren't accessible on any of them so basically labs were inaccessible for anyone.
<3 passwords anyway, I know you all hate working things, like passwords, so there is another idea. Set up a cron script that sync a local folder on bastion with /public/keys so that when gluster is down or that folder isn't working login to bastion's still works. On Sun, Mar 3, 2013 at 9:37 PM, Petr Bena <[email protected]> wrote: > YAY. it would be cool if some of them would be mirroring keys from > gluster so that it works even when gluster is down :> > > On Sun, Mar 3, 2013 at 8:43 PM, Ryan Lane <[email protected]> wrote: >> On Sun, Mar 3, 2013 at 7:51 AM, Petr Bena <[email protected]> wrote: >>> >>> HI, >>> >>> today it's second time that bastion was inaccessible: >>> >>> If you are having access problems, please see: >>> >>> https://wikitech.wikimedia.org/wiki/Access#Accessing_public_and_private_instances >>> debug1: Authentications that can continue: publickey >>> debug1: Next authentication method: publickey >>> debug1: Offering RSA public key: /home/petanb/.ssh/id_rsa >>> debug2: we sent a publickey packet, wait for reply >>> >>> >>> if we can't have a different way to authenticate than using public >>> keys WHICH ARE broken often - can we have at least second stable login >>> server. >>> >>> BTW I assume that logins didn't work because of gluster so that it >>> wouldn't work anyway, but if gluster suck so hard, can we at least >>> have password auth until you fix it? Bad authentication is better than >>> no working authentication >>> >> >> Though I'm usually more than happy to blame gluster, this was not caused by >> gluster. It was because someone OOM'd the instance. >> >> We've actually finally stablized gluster to a point where we shouldn't be >> having complete outages any more: >> >> https://ganglia.wikimedia.org/latest/?r=month&cs=&ce=&m=cpu_report&s=by+name&c=Glusterfs+cluster+pmtpa&h=&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4 >> >> Note in the above graph that the past week and a half the memory usage has >> been mostly flat. There was one spot where the memory ballooned, then a spot >> where it dropped. That last memory balloon was before the changes we put in >> place and the drop was where I restarted the glusterd processes (which >> doesn't affect filesystem access). >> >> There are some split brain issues still around from the most recent round of >> instability, but the SSH keys are perfectly fine. I will not enable password >> authentication. It's incredibly insecure. >> >> So, to get a little more back on point, I've just created >> bastion2.wmflabs.org and bastion3.wmflabs.org, in case the bastion instances >> OOM again. >> >> - Ryan >> >> _______________________________________________ >> Labs-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/labs-l >> _______________________________________________ Labs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/labs-l
