Re: [Distutils] Google Auth is broken for PyPI

Donald Stufft Sun, 15 Feb 2015 15:53:39 -0800

> On Feb 15, 2015, at 5:25 PM, Robert Collins <robe...@robertcollins.net> wrote:
> 
> I probably shouldn't, but I feel compelled to reply :).
> 
> On 11 February 2015 at 06:33, Donald Stufft <don...@stufft.io> wrote:
>> 
>>> On Feb 10, 2015, at 11:23 AM, Martin v. Löwis <mar...@v.loewis.de> wrote:
>>> 
>>> Am 10.02.15 um 15:36 schrieb Donald Stufft:
>>>> Honestly, I’d rather have less federated login not more. I wish the 
>>>> current OpenID support had never been added.
>>>> 
>>> 
>>> Can you please elaborate on that position? Why is it useful to have
>>> separate accounts on separate systems?
>> 
>> Sure.
>> 
>> So the basic premise behind federated auth is that you can get a single set
>> of credentials on all (or most) of your sites and eliminate the need to have 
>> a
>> password for each site you visit.
>> 
>> My opinion is basically influenced by a number of factors:
>> 
>> 1. I feel like the goal of federated auth has failed in general and is 
>> unlikely
>>   to ever succeed. As a user of websites I have over 400 different entries in
>>   my password manager, even if 50% of them implement federated auth (which I
>>   feel like is a high number but that's not backed by math, just gut feeling)
>>   that's still over 200 entries I need to maintain in my password manager. In
>>   this case federated auth has not meaningfully reduced the burden of
>>   maintaining password for me since maintaining 200 isn't any easier than 400
>>   and instead it just complicates my login flow
> 
> So, what is success here? I'd call 200 less passwords to maintain and
> rotate on a regular basis a GOOD THING. I very much doubt that you
> would have 2FA set up on the other 200 things, so that would mean a
> change from 400 sites w only a couple having 2FA to 200 with regular
> rotations and 2FA, and 200 liabilities.


Success (for me) is when federated auth enables me to no longer need to worry
about passwords in my day to day use of the web. Currently it's not even close
and it doesn't appear to be getting any closer. The places where it is even
possible it's generally only possible to sign in with Github/Twitter/Facebook
and I'm unwilling to place the ability to authenticate as me to a wide number
of services with them. The only time I'm willing to do so is "throw away" sites
where my account on those sites don't really matter to me.


> 
>> 2. As a site operator I feel like authentication is a core part of the
>>   experience of using my site and by allowing federated auth on my site I'm
>>   giving up control over that user flow. A relevant example from PyPI is that
>>   a number of users signed up using MyOpenID which is no longer being
>>   maintained. This means that either PyPI has to tell those people
>>   "tough shit" or PyPI needs to figure out a mitigation tactic against that.
>>   Another example is that launchpad randomly starts failing for people, and
>>   it'll fail consistently for the same person until it just stops failing for
>>   them. I'm unable to actually reproduce this error so it's extremely hard
>>   for me to do anything else but shrug and tell them not to use it.
> 
> I'm genuinely curious here. Why do you feel that authentication is a
> core part of the experience? Its a necessary part, sure. But I find it
> hard to imagine that many people say 'that bug tracking site, its got
> *awesome authentication*'! I see authentication as something that is
> very very hard to get right, and incredibly easy to get wrong. I don't
> trust folk that are experts in e.g. bugtracking. Or code hosting. Or
> todo list management to necessarily understand all the intricacies of
> password handling (e.g. *how many sites don't use PBKDF2*!) Or worse
> truncate the input password you give to 8 characters (yes, seriously).
> Its not that the site operators aren't trustworthy in general, its
> that password handling is nasty:
> - its hard to get right
> - you won't know if you got it wrong until you or your users are compromised
> - even sites with dedicated teams doing just the IdP aspect get it wrong
> 
> I consider it irresponsible for less well resources sites to get into
> credentials management unless they truely have no choice: they're
> tackling something they're almost certain to get wrong.

Authentication is like a lot of pieces of maintaining a service, where if you
get it done *really well* people won’t notice it exists and if you do it wrong
people will notice it’s bad immediately. Sites go out of their way to take
control of the authentication flow to ensure that it gives the best possible
experience for their users. Delegating that out to someone else is giving up
control of it, which means that if the place it's been delegated to isn't able
to keep up then you're SOL because you've exposed what should be implementation
details of a particular app to the end user.

You say that you don't trust them to get authentication correct, which seems
silly to me given that you apparently trust them to handle ACLs or any number
of other parts of a secure web app correctly. However even if a particular app
doesn't want to handle their own authentication, there are better ways to
handle delegation, such as something like https://stormpath.com/ which allows
you to delegate the actual storage and handling of passwords and properly
handling authentication to a third party that specializes in that, but without
exposing the details of that to your end users, so that if you need to migrate
at some point you can.

> 
>> 3. I feel like unless you solely rely on federated auth, then federated auth 
>> is
>>   always going to be a second class citizen for any particular website. For
>>   instance Travis CI uses federated auth via Github only, but that's the only
>>   thing they support for authentication so everything works well with that. 
>> On
>>   the other hand a number of sites support federated auth ontop of local
>>   accounts and federated auth is almost always worse in some ways, sometimes
>>   as simple as the username you get is kinda crappy (dstufft_<somehash>)
>>   sometimes some features don't work (or don't work very well) at all like
>>   on PyPI where we need to authenticate people outside of a web context so
>>   if we don't have usernames/passwords then we end up needing to require the
>>   user to register a secondary "api password" or API key.
> 
> Relying solely on federated auth is fine by me :). You don't need to
> tie yourself to one provider. Yes, most users will use just one of
> fb/github/google/lp/twitter in our community, but you can (and should)
> do unification on email address's to allow dealing with failed
> providers [but only for trustworthy providers or by doing an email
> verification step before unifying] and manage ACLs and privileged
> operations locally.
> 
> The fact that some sites doit crappily is in no way an inditement of
> the basic tech - in fact some sites do it really well. Its gotten so
> good that these days the only time I will sign into a site that
> *doesn't* use federated auth is if there is something I really really
> really want from it. E.g. I made an account with Elite:Dangerous.

Relying only on federated auth is an fairly poor user experience. You have to
essentially tell someone that before they can use your site, they have to go
pick one of these other sites to become a user of. Since I know it's relevant
to you as well as me, I hate having to log into any Openstack service via
Launchpad because it's an extra step that I don't have to do on any service
that doesn't delegate auth, since I have to first login to launchpad and then
I have to tell it to allow the login to an openstack service.

Nevermind the fact that there is huge phishing potentional and that the more
central auth gets the more likely you're going to see people who want to
attempt to phish your login information.

I do think that in some cases federated auth can make sense, especially for
small sites where even a slight inconvience or delay in going from an
unregistered user to a logged in user can cause you to lose traffic all
together. These sites also tend to have a very low amount of inertia tied to
a specific user account, if you lose access to your account spinning up a new
one is low impact. Compared to PyPI where the security of an account is of
paramount importance and that a lost account is incredibly disruptive.

> 
>> 4. I feel like none of the current solutions to federated auth are very good.
>>   OpenID relies on using an URL as your "personal identifier" which I feel
>>   like is a strange and foreign concept to most users. The way around this is
>>   often to just hardcode a list of sites, but then as a site operator you're
>>   implicitly recommending that users go sign up for one of those sites and
>>   use them on your site to login. This is creating an explicit relationship
>>   between your site and the other site, a relationship in which you often 
>> have
>>   no power (for instance, Google <-> PyPI, we're powerless to do anything
>>   about them deprecating OpenID other than just sucking it up and dealing 
>> with
>>   it). Persona did offer a way around this, but persona had other failings
>>   like relying on the domain that you happened to be using for your email to
>>   implement a persona IdP or otherwise falling back to an implicit 
>> relationship
>>   with the fallback provider, again one where you're more or less powerless 
>> to
>>   the operators of that service.
> 
> I agree that they're not brilliant. OpenID is basically dead, long
> live OpenID Connect :/. So the thing there AIUI is that OAuth worked
> out a lot better (more flexible, consistent with both CLI / app
> workflows and server side web interactions). And as such everyone is
> just consolidating on the one toolchain to avoid lots of needless
> redundancy. But as user, its fine. I don't judge a site as subordinate
> to Google if they allow Google logins, for instance.
> 
> Yes, if you use federated auth you need to keep up. But hell, we need
> to keep up if we do our own auth management. When was the last time
> the hash count on PyPI's password database was increased to account
> for hash rate growths? Managing credentials is an ongoing effort - at
> Canonical we split that out into its own team, and they were busy just
> keeping on top of it and changes in the fundamentals for years. See
> above about hard to get right.

PyPI uses bcrypt with a work factor of 12, we're using the excellent passlib
library which means we can easily set things up to automatically migrate
between different algorithms and different work factors/rounds within the same
algorithm. The current settings on PyPI is roughly 0.3 (on my iMac in a
completely unscientific benchmark with an iteration of 1 since any large
number of iterations takes forever) seconds per password hash and bumping
bcrypt to 13 makes it take roughly 0.6. I do check periodically that our work
factor is appropiate and at some point I plan on migrating the algorithm to
scrypt once a reasonable implementation of it is available for Python.

Honestly speaking as someone who has implemented both authentication libraries,
OpenID servers, and OpenID clients I feel a lot more like storing a password
safely isn't really that hard, especially with good libraries like passlib
to help with it, and in the grand scheme of things it's only a tiny part of
what makes a secure application. Even within authentication itself there are
a number of things that federated auth simply won't help with. Things like
ensuring that you rotate your session identifiers when you cross authentication
boundaries are things that even federated authentication doesn't handle for you
and are far more likely to get forgotten when creating a site than what
password storage and algorithms someone uses.

PyPI in particular is a web service that needs to be designed to last decades.
It's not the kind of site web service where you can just tell people that hey
guess what, tomorrow we all need to switch away to some other authentication
service because MyOpenID (or whatever) decided to shut down. It's less about
needing to "keep up" and more about who has to do the keeping up, for ensuring
we're still safely storing passwords it's simple enough for the PyPI admins to
ensure that we're still "safe enough", however when using authentication where
the fact it's being delegated has been "leaked" to the end user, it's up to
each and every individual user on PyPI to ensure that they are using an auth
provider that is keeping up, and the end users are the ones least likely to
do that on any kind of meaningful scale.

> 
>> Overall I think that the use of federated auth, as a site operator, is really
>> only worth it over the loss of control in two scenarios:
>> 
>> A. When your site is already entwined with another site and relying on them 
>> for
>>   authentication is simply increasing that. An example of this from above is
>>   Travis CI where they only work with things hosted on GitHub so also relying
>>   on GitHub for authentication isn't that big of a deal and actually makes
>>   things better since they can then integrate with GitHub's permissions to
>>   check if you have commit on a particular repository.
>> 
>> B. When creating an account is likely to be enough of a burden to make people
>>   decide not to interact with your site. This category is basically 
>> completely
>>   comprised of sites that do not have long standing relationships with their
>>   users. The only real example I can think of this of the top of my head is
>>   sites with comments enabled like blogs, news sites, etc. The commentors are
>>   unlikely to have or want a long standing relationship with your site, they
>>   just want to make a quick one off comment and then possibly never come
>>   back. Sites like PyPI otherhand the cost of creating an account is small
>>   compared to the life time of majority of our user base's interaction with
>>   us.
> 
> I think you're underestimating the impact this has on users. It
> definitely creates a high barrier to entry for me, and I don't think
> I'm alone. For bugs.python.org I leapt on Federated auth, but for PyPI
> I can't use it because it doesn't allow consolidating the accounts
> (AFAICT). Is it a matter of toits? E.g. do you need someone to provide
> patches to both permit the new OpenID Connect, OAuth for console use,
> and connecting OpenID Connect identities to local usercodes?

Honestly, I feel like 90% of the problem people have with authentication on
the python.org web properties can be solved by implementing SSO. The problem
here is that you have one logical collection of sites that all have different
authentication silos. I don't think that allowing someone to log in with
Github, or Google, or whatever the flavor of the week is will make a much
higher impact over SSO.

> 
>> A key thing to me, as a site operator, is keeping as much control over the
>> experience of my users as I can. Obviously I have to outsource some things
>> because It's not reasonable for me to make my own hardware, write my own
>> drivers, my own kernel, my own OS, my own webserver etc. A good example of a
>> major outsourcing that I was involved in was moving things behind Fastly.
>> However a key difference between that outsourcing and this outsourcing is 
>> that
>> if things go sour with Fastly or we need to migrate away from them for one
>> reason or another we can do that without end users needing to change much or
>> anything. However if something like Google dropping OpenID supports happens
>> then the users who relied on that are out of luck and our ability to shield
>> them from the fallout of that is limited.
> 
> Thats true, OTOH I think I've made a reasonable case above that our
> ability to shield users from our own mistakes is limited, and dealing
> with passwords really isn't as simple as all that... and updating to
> OpenID Connect should be pretty straight forward, there are good
> libraries for it all around.

We'll likely update the Google authentication to use OpenID Connect because
simply we already made the mistake of implementing and enough people are
relying on it that we can't just simply drop it if there's a reasonable way for
us to continue support it. If you want to submit a pull request to do that, it
would be most appreciated.

Going forward however I'm likely going to be -1 on adding any additional forms
of federated/delegated authentication, except possibly to a global Python Auth
service. That includes adding generic support for OpenID Connect or Persona or
whatever other protocol is the flavor of the month. I'm also likely going to
end up trying to de-emphasize the ability to use anything but a local account
and try to guide users towards using local accounts where possible. Finally
if I ever do come up with a reasonable way to migrate away from what federated
authentication support we have today in a low impact way you'll likely see a
proposal on distutils-sig for that, however I'm not very hopeful except to just
wait it out for the services to die out and remove support for them.

> 
>> At this point we already have it enabled, so unless someone comes up with a
>> really good migration strategy I doubt we'll be able to get rid of it. 
>> However
>> for the reasons above I'm pretty much against adding *additional* federated
>> auth things and I think that we should treat it more of a legacy thing and
>> downplay the fact we have support for it. Bitbucket has downplayed support 
>> for
>> random OpenID as well, when you go to their login pages it shows a login form
>> that looks like http://d.stufft.io/image/1O2l2g073h0h, which still lets you
>> login with OpenID but it's muted and downplayed.
>> 
>> In a slightly hypocritical view point, I actually think that at some point we
>> should get something like id.python.org which is an IdP and switch all of the
>> *.python.org sites to authenticate against that instead of keeping local
>> user accounts. This would reduce the number of passwords that Python inflicts
>> on people but it still keeps authentication within our 
>> (PSF/Python/whatever)'s
>> control. This is more along the lines of implementing SSO using a federated
>> auth technology than actual federated auth though.
> 
> Counterpoint: why not get rid of local auth altogether (for web
> service, not system administration). What do we, a non-profit, do that
> requires direct control over auth? At least - bugs.python.org, pypi,
> both of which support OpenID today, we've clearly considered that
> there its ok.
> 
> If we didn't have local auth at all that would free up cycles to do
> whatever (moderate) chasing of evolving federation standards is
> needed.

I don't trust other services to handle authentication for something as
important as PyPI, and it's unlikely that a service ends up coming around that
I both trust enough to be willing to give them the ability to essentially
authenticate as a wide number of users whenever they want, and where I'm
willing to tie the long term operability to PyPI as a service to them in a way
where we can't easily pull them out of our stack and replace it with minimal
impact on end users.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Distutils-SIG maillist  -  Distutils-SIG@python.org
https://mail.python.org/mailman/listinfo/distutils-sig

Re: [Distutils] Google Auth is broken for PyPI

Reply via email to