Awesome diagram. Worth at least 1k words, I think. :) I've got two observations: 1. In "checks capabilities found in JWT", that should include "validate JWT with auth server", which can be as simple as checking a timestamp. 2. I may have missed it, but how is the route from the Gateway to TO secured?
On Fri, May 12, 2017 at 8:41 AM David Neuman <[email protected]> wrote: > +1 on keeping in on the mailing list > > On Fri, May 12, 2017 at 7:52 AM, Eric Friedrich (efriedri) < > [email protected]> wrote: > > > Can we please keep the discussion on the dev@ list? Apache rules and > all. > > > > “If it didn’t happen on the email list, it didn’t happen” > > > > > > On 5/12/17, 9:50 AM, "Amir Yeshurun" <[email protected]> wrote: > > > > Moving discussion to the wiki page. Commented on Jeremy's notes > > > > On Fri, May 12, 2017 at 7:41 AM Shmulik Asafi <[email protected]> > > wrote: > > > > > Regarding sharing signing key with other services raised by Eric - > > notice > > > jwt support asymmetric keys. I.e. only the auth server has the > > private key > > > and other services have a public key. > > > > > > Also, there's another solution for "global invalidation" besides > > switching > > > keys, and that is setting a threshold for token issue date. > Whenever > > > there's an attack or whatever and we want to invalidate all tokens > > we just > > > need to set the threshold for 'now' thus forcing all users to issue > > a new > > > token. > > > I think this is more reasonable and scalable than switching keys. > > > > > > On 12 May 2017 05:15, "Jeremy Mitchell" <[email protected]> > > wrote: > > > > > > > Here was the image I was trying to attach: > > > > > > > > https://cwiki.apache.org/confluence/display/TC/API+Gateway > > > > > > > > Jeremy > > > > > > > > On Thu, May 11, 2017 at 2:14 PM, Amir Yeshurun <[email protected]> > > wrote: > > > > > > > > > Hi Jeremy, > > > > > Note that attachments seems to be stripped off on this list and > > the > > > image > > > > > is unavailable. > > > > > > > > > > Your assumptions are correct. We need to figure out the easiest > > > topology > > > > > for UI routes to bypass the GW. Please reattach the picture so > > we can > > > get > > > > > more specific. > > > > > > > > > > Thanks > > > > > /amiry > > > > > > > > > > > > > > > > > > > > On Thu, May 11, 2017, 20:06 Jeremy Mitchell < > > [email protected]> > > > > wrote: > > > > > > > > > > > What is of utmost importance to me is the ability to ease > into > > this. > > > We > > > > > > have a TO UI right now that needs to be unaffected by the API > > gateway > > > > in > > > > > my > > > > > > opinion. Granted the old UI might go away at some point but > > until > > > that > > > > > time > > > > > > it needs to function as-is. > > > > > > > > > > > > To me, the simplest approach is to key off request URL. > > anything that > > > > > > starts with /api gets api gateway treatment, the rest passes > on > > > > > > thru...Here's a fancy picture to communicate what I > envision... > > > > > > > > > > > > [image: Inline image 1] > > > > > > > > > > > > I'm assuming all requests (endpoints) go thru the api gateway > > but > > > maybe > > > > > > i'm wrong in that assumption. Anyhow, i guess my point is the > > UI > > > should > > > > > > continue to work with the mojo cookie and "api" calls should > > use the > > > > jwt > > > > > > token...however, the UI also uses api endpoints so not sure > > how that > > > > > would > > > > > > work... > > > > > > > > > > > > If it's too difficult for the api gateway to support UI and > API > > > routes, > > > > > we > > > > > > could always wait until the new UI (which leverages the API) > is > > > > > complete... > > > > > > > > > > > > Jeremy > > > > > > > > > > > > On Thu, May 11, 2017 at 10:23 AM, Chris Lemmons < > > [email protected]> > > > > > > wrote: > > > > > > > > > > > >> > invalidate ALL tokens by changing the token signing key > > > > > >> > > > > > >> Interesting idea. That does mean that the signing key has to > > be > > > > > retrieved > > > > > >> every time from the authentication authority, or it'd be > > subject to > > > > the > > > > > >> exact same set of attacks. But a nearly-constant rarely > > changing key > > > > > could > > > > > >> be communicated very efficiently, I suspect. And if the > > > authentication > > > > > >> system is a web API, it can even use Modified-Since to 304 > > 99% of > > > the > > > > > time > > > > > >> for maximum efficiency. > > > > > >> > > > > > >> It does have the downside that key-invalidation events are > > fairly > > > > > >> significant. You'd need to invalidate the keys whenever > > someone's > > > > access > > > > > >> was reduced or removed. As the number of accounts in the > > system > > > > > increases, > > > > > >> that might not wind up being as infrequent as one might > hope. > > It's > > > > easy > > > > > to > > > > > >> implement, though. > > > > > >> > > > > > >> On Thu, May 11, 2017 at 10:12 AM Jeremy Mitchell < > > > > [email protected] > > > > > > > > > > > >> wrote: > > > > > >> > > > > > >> > Regarding the TTL on the JWT token. a 5 minute TTL seems > > silly. > > > > What's > > > > > >> the > > > > > >> > point? Unless we get into refresh tokens but that sounds > > like > > > > > >> oauth...blah. > > > > > >> > > > > > > >> > What about this and maybe i'm oversimplifying. the TTL on > > the jwt > > > > > token > > > > > >> is > > > > > >> > 24 hours. If we become aware that a token has been > > compromised, > > > > > >> invalidate > > > > > >> > ALL tokens by changing the token signing key. maybe this > is > > a good > > > > > idea > > > > > >> or > > > > > >> > maybe this is a terrible idea. I have no idea. just a > > thought.. > > > > > >> > > > > > > >> > jeremy > > > > > >> > > > > > > >> > On Wed, May 10, 2017 at 12:23 PM, Chris Lemmons < > > > [email protected] > > > > > > > > > > >> > wrote: > > > > > >> > > > > > > >> > > Responding to a few people: > > > > > >> > > > > > > > >> > > > Often times every auth action must be accompanied by > DB > > writes > > > > for > > > > > >> > audit > > > > > >> > > logs or callback functions. > > > > > >> > > > > > > > >> > > True. But a) if logging is too expensive it should > > probably be > > > > made > > > > > >> > cheaper > > > > > >> > > and b) the answer to "audits are too expensive" probably > > isn't > > > > "lets > > > > > >> just > > > > > >> > > do less authentication". If the audit log is genuinely > the > > > > > >> bottle-neck, > > > > > >> > it > > > > > >> > > would still be better to re-auth without the audit log. > > > > > >> > > > > > > > >> > > > The API gateway can poll for the latest list of tokens > > at a > > > > > regular > > > > > >> > > interval > > > > > >> > > > > > > > >> > > Yeah, datastore replication for local performance is > > great. > > > Though > > > > > if > > > > > >> you > > > > > >> > > can reasonably query for a list of all valid tokens > every > > > second, > > > > > it's > > > > > >> > > probably cheaper to to just query for the token you need > > every > > > > time > > > > > >> you > > > > > >> > > need it. If there are massive batches of queries that > are > > coming > > > > > >> through, > > > > > >> > > it's probably not unreasonable to choose not to > > re-validate a > > > > token > > > > > >> > that's > > > > > >> > > been validated in the last second. > > > > > >> > > > > > > > >> > > > Regarding maliciously delayed message or such - I > don't > > fully > > > > > >> > understand > > > > > >> > > the > > > > > >> > > point; if an attacker has such capabilities she can > simply > > > > > >> prevent/delay > > > > > >> > > devop users from updating the auth database itself thus > > enabling > > > > the > > > > > >> > > attack. > > > > > >> > > > > > > > >> > > In a typical attack, an attacker might gain control of a > > box on > > > > the > > > > > >> local > > > > > >> > > network, but not necessarily the Gateway, Traffic Ops, > or > > Auth > > > > > Server. > > > > > >> > > Those are probably better hardened. But lots of networks > > have a > > > > > >> squishy > > > > > >> > > test box that everyone forgot was there or something. > The > > bad > > > guy > > > > > >> wants > > > > > >> > to > > > > > >> > > use the CDN to DOS someone, or redirect traffic to > > somewhere > > > > > >> malicious, > > > > > >> > or > > > > > >> > > just cause mayhem. The longer he can keep control, the > > better > > > for > > > > > him. > > > > > >> > > > > > > > >> > > So this attacker uses the local box to sniff the token > > off the > > > > > >> network. > > > > > >> > If > > > > > >> > > the communication with the Gateway is encrypted, he > might > > have > > > to > > > > do > > > > > >> some > > > > > >> > > ARP poisoning or something else to trick a host into > > talking to > > > > the > > > > > >> local > > > > > >> > > box instead. (Properly implemented TLS also migates this > > angle.) > > > > He > > > > > >> knows > > > > > >> > > that as soon as he starts his nefarious deed, alarms are > > going > > > to > > > > go > > > > > >> off, > > > > > >> > > so he also uses this local box to DOS the Auth Server. > > It's a > > > lot > > > > > >> easier > > > > > >> > to > > > > > >> > > take a box down from the outside than to actually gain > > control. > > > > > >> > > > > > > > >> > > If the Gateway "fails open" when it can't contact the > Auth > > > server, > > > > > the > > > > > >> > > attacker remains in control. If it "fails closed", the > > attacker > > > > has > > > > > to > > > > > >> > > actually compromise the auth server (which is harder) to > > remain > > > in > > > > > >> > control. > > > > > >> > > > > > > > >> > > > Do we block all API calls if the auth service is > > temporarily > > > > down > > > > > >> > (being > > > > > >> > > upgraded, container restarting, etc…)? > > > > > >> > > > > > > > >> > > Yes, I think we have to. Authentication is integral to > > reliable > > > > > >> > operation. > > > > > >> > > > > > > > >> > > We've been talking in some fairly wild hypotheticals, > > though. Is > > > > > >> there a > > > > > >> > > specific auth service you're envisioning? > > > > > >> > > > > > > > >> > > On Wed, May 10, 2017 at 12:50 AM Shmulik Asafi < > > > > [email protected]> > > > > > >> > wrote: > > > > > >> > > > > > > > >> > > > Regarding the communication issue Chris raised - there > > is more > > > > > than > > > > > >> one > > > > > >> > > > possible pattern to this, e.g.: > > > > > >> > > > > > > > > >> > > > - Blacklisted tokens can be communicated via a > > pub-sub > > > > > mechanism > > > > > >> > > > - The API gateway can poll for the latest list of > > tokens > > > at a > > > > > >> > regular > > > > > >> > > > interval (which can be very short ~1sec, much > > shorter than > > > > the > > > > > >> time > > > > > >> > it > > > > > >> > > > takes devops to detect and react to malign tokens) > > > > > >> > > > > > > > > >> > > > Regarding hitting the blacklist datastore - this only > > sounds > > > > > >> similar to > > > > > >> > > > hitting to auth database; but the simplicity of a > > blacklist > > > > > function > > > > > >> > > allows > > > > > >> > > > you to employ more efficient datastores, e.g. Redis or > > just a > > > > > >> hashmap > > > > > >> > in > > > > > >> > > > the API gateway process memory. > > > > > >> > > > > > > > > >> > > > Regarding maliciously delayed message or such - I > don't > > fully > > > > > >> > understand > > > > > >> > > > the point; if an attacker has such capabilities she > can > > simply > > > > > >> > > > prevent/delay devop users from updating the auth > > database > > > itself > > > > > >> thus > > > > > >> > > > enabling the attack. > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Wed, May 10, 2017 at 4:25 AM, Eric Friedrich > > (efriedri) < > > > > > >> > > > [email protected]> wrote: > > > > > >> > > > > > > > > >> > > > > Our current management wrapper around Traffic > Control > > > (called > > > > > OMD > > > > > >> > > > > Director, demo’d at last TC summit) uses a very > > similar > > > > approach > > > > > >> to > > > > > >> > > > > authentication. > > > > > >> > > > > > > > > > >> > > > > We have an auth service that issues a JWT. The JWT > is > > then > > > > > >> provided > > > > > >> > > along > > > > > >> > > > > with all API calls. A few comments on our practical > > > > experience: > > > > > >> > > > > > > > > > >> > > > > - I am a supported of validating tokens both in the > > API > > > > gateway > > > > > >> and > > > > > >> > in > > > > > >> > > > the > > > > > >> > > > > service. We have several examples of services- > > Grafana for > > > > > >> example, > > > > > >> > > that > > > > > >> > > > > require external authentication. Similarly, we have > > other > > > > > services > > > > > >> > that > > > > > >> > > > > need finer grained authentication than API Gateway > > policy > > > can > > > > > >> handle. > > > > > >> > > > > Specifically, a given user may have permissions to > > > view/modify > > > > > >> some > > > > > >> > > > > delivery services but not others. The API gateway > > presumably > > > > > would > > > > > >> > not > > > > > >> > > > > understand the semantics of payload so this decision > > would > > > > need > > > > > >> to be > > > > > >> > > > made > > > > > >> > > > > by auth within the service. > > > > > >> > > > > > > > > > >> > > > > - As brought up earlier, auth in the gateway is > both a > > > > strength > > > > > >> and a > > > > > >> > > > > risk. Additional layer of security is also positive, > > but for > > > > my > > > > > >> case > > > > > >> > of > > > > > >> > > > > Grafana above it can present an opportunity to > bypass > > > > > >> authentication. > > > > > >> > > > This > > > > > >> > > > > is a risk, but it can be mitigated by adding auth to > > the > > > > service > > > > > >> > where > > > > > >> > > > > needed. > > > > > >> > > > > > > > > > >> > > > > - Verifying tokens on every access may potentially > be > > more a > > > > > >> little > > > > > >> > > > > expensive than discussed. Often times every auth > > action must > > > > be > > > > > >> > > > accompanied > > > > > >> > > > > by DB writes for audit logs or callback functions. > > Not the > > > > straw > > > > > >> to > > > > > >> > > break > > > > > >> > > > > the camel’s back, but something to keep in mind. > > > > > >> > > > > > > > > > >> > > > > - There is also the problem of what to do if the > > underlying > > > > auth > > > > > >> > > service > > > > > >> > > > > is temporarily unavailable. Do we block all API > calls > > if the > > > > > auth > > > > > >> > > service > > > > > >> > > > > is temporarily down (being upgraded, container > > restarting, > > > > > etc…)? > > > > > >> > > > > > > > > > >> > > > > - I’d like to see what we can do to use a > pre-existing > > > package > > > > > as > > > > > >> an > > > > > >> > > API > > > > > >> > > > > Gateway. As we decompose TO into microservices, > > something > > > like > > > > > >> nginx > > > > > >> > > can > > > > > >> > > > > provide additional benefits like TLS termination and > > load > > > > > >> balancing > > > > > >> > > > between > > > > > >> > > > > service endpoints. I’d hate to see us have to > > reimplement > > > > these > > > > > >> > > functions > > > > > >> > > > > later. > > > > > >> > > > > > > > > > >> > > > > - I’d also like to see us give some consideration to > > how an > > > > API > > > > > >> > gateway > > > > > >> > > > is > > > > > >> > > > > deployed. We raised the bar for new users by > > unbundling > > > > Traffic > > > > > >> Ops > > > > > >> > > from > > > > > >> > > > > the database and it could further complicate the > > > installation > > > > if > > > > > >> we > > > > > >> > > don’t > > > > > >> > > > > provide enough guidance on how to deploy the API > > gateway in > > > a > > > > > lab > > > > > >> > > trial, > > > > > >> > > > if > > > > > >> > > > > not best practices for production deployment. Should > > we > > > > > recommend > > > > > >> to > > > > > >> > > > deploy > > > > > >> > > > > as an new RPM/systemd service, an immutable > > container, or as > > > > > part > > > > > >> of > > > > > >> > > the > > > > > >> > > > > existing TO RPM? > > > > > >> > > > > > > > > > >> > > > > —Eric > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > On May 9, 2017, at 5:05 PM, Chris Lemmons < > > > > [email protected] > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > > > > >> > > > > > Blacklisting requires proactive communication > > between the > > > > > >> > > > authentication > > > > > >> > > > > > system and the gateway. Furthermore, the client > > can't be > > > > sure > > > > > >> that > > > > > >> > > > > > something hasn't been blacklisted recently (and > the > > > message > > > > > >> lost or > > > > > >> > > > > perhaps > > > > > >> > > > > > maliciously delayed) unless it checks whatever > > system it > > > is > > > > > that > > > > > >> > does > > > > > >> > > > the > > > > > >> > > > > > blacklisting. And if you're checking a datastore > of > > some > > > > sort > > > > > >> for > > > > > >> > the > > > > > >> > > > > > validity of the token every time, you might as > well > > just > > > > check > > > > > >> each > > > > > >> > > > time > > > > > >> > > > > > and skip the blacklisting step. > > > > > >> > > > > > > > > > > >> > > > > > On Tue, May 9, 2017 at 1:27 PM Shmulik Asafi < > > > > > >> [email protected]> > > > > > >> > > > wrote: > > > > > >> > > > > > > > > > > >> > > > > >> Hi, > > > > > >> > > > > >> Maybe a missing link here is another component in > > a jwt > > > > > >> stateless > > > > > >> > > > > >> architecture which is *blacklisting* malign > tokens > > when > > > > > >> necessary. > > > > > >> > > > > >> This is obviously a sort of state which needs to > be > > > handled > > > > > in > > > > > >> a > > > > > >> > > > > datastore; > > > > > >> > > > > >> but it's quite different and easy to scale and > has > > less > > > > > >> > performance > > > > > >> > > > > impact > > > > > >> > > > > >> (I guess especially under DDOS) than doing full > > auth > > > > queries. > > > > > >> > > > > >> I believe this should be the approach on the API > > Gateway > > > > > >> roadmap > > > > > >> > > > > >> Thanks > > > > > >> > > > > >> > > > > > >> > > > > >> On 9 May 2017 21:14, "Chris Lemmons" < > > [email protected] > > > > > > > > > >> wrote: > > > > > >> > > > > >> > > > > > >> > > > > >>> I'll second the principle behind "start with > > security, > > > > > >> optimize > > > > > >> > > when > > > > > >> > > > > >>> there's a problem". > > > > > >> > > > > >>> > > > > > >> > > > > >>> It seems to me that in order to maintain > security, > > > > basically > > > > > >> > > everyone > > > > > >> > > > > >> would > > > > > >> > > > > >>> need to dial the revalidate time so close to > zero > > that > > > it > > > > > does > > > > > >> > very > > > > > >> > > > > >> little > > > > > >> > > > > >>> good as a cache on the credentials. Otherwise, > as > > Rob as > > > > > >> pointed > > > > > >> > > out, > > > > > >> > > > > the > > > > > >> > > > > >>> TTL on your credential cache is effectively "how > > long > > > am I > > > > > ok > > > > > >> > with > > > > > >> > > > > >> hackers > > > > > >> > > > > >>> in control after I find them". Practically, it > > also > > > means > > > > > that > > > > > >> > much > > > > > >> > > > lag > > > > > >> > > > > >> on > > > > > >> > > > > >>> adding or removing permissions. That effectively > > means a > > > > > >> database > > > > > >> > > hit > > > > > >> > > > > for > > > > > >> > > > > >>> every query, or near enough to every query as > not > > to > > > > matter. > > > > > >> > > > > >>> > > > > > >> > > > > >>> That said, you can get the best of multiple > > worlds, I > > > > think. > > > > > >> The > > > > > >> > > only > > > > > >> > > > > DB > > > > > >> > > > > >>> query that really has to be done is "give me the > > last > > > > update > > > > > >> time > > > > > >> > > for > > > > > >> > > > > >> this > > > > > >> > > > > >>> user". Compare that to the generation time in > the > > token > > > > and > > > > > >> 99% > > > > > >> > of > > > > > >> > > > the > > > > > >> > > > > >>> time, it's the only query you need. With that > > check, you > > > > can > > > > > >> even > > > > > >> > > use > > > > > >> > > > > >>> fairly long-lived tokens. If anything about the > > user has > > > > > >> changed, > > > > > >> > > > > reject > > > > > >> > > > > >>> the token, generate a new one, send that to the > > user and > > > > use > > > > > >> it. > > > > > >> > > The > > > > > >> > > > > >>> regenerate step is somewhat expensive, but still > > well > > > > inside > > > > > >> > > > > reasonable, > > > > > >> > > > > >> I > > > > > >> > > > > >>> think. > > > > > >> > > > > >>> > > > > > >> > > > > >>> On Tue, May 9, 2017 at 11:31 AM Robert Butts < > > > > > >> > > > [email protected] > > > > > >> > > > > > > > > > > >> > > > > >>> wrote: > > > > > >> > > > > >>> > > > > > >> > > > > >>>>> The TO service (and any other service that > > requires > > > > auth) > > > > > >> MUST > > > > > >> > > hit > > > > > >> > > > > >> the > > > > > >> > > > > >>>> database (or the auth service, which itself > hits > > the > > > > > >> database) > > > > > >> > to > > > > > >> > > > > >> verify > > > > > >> > > > > >>>> valid tokens' users still have the permissions > > they did > > > > > when > > > > > >> the > > > > > >> > > > token > > > > > >> > > > > >>> was > > > > > >> > > > > >>>> created. Otherwise, it's impossible to revoke > > tokens, > > > > e.g. > > > > > >> if an > > > > > >> > > > > >> employee > > > > > >> > > > > >>>> quits, or an attacker gains a token, or a user > > changes > > > > > their > > > > > >> > > > password. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> I'm elaborating on this, and moving a > discussion > > from a > > > > PR > > > > > >> > review > > > > > >> > > > > here. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> From the code submissions to the repo, it > > appears the > > > > > current > > > > > >> > plan > > > > > >> > > > is > > > > > >> > > > > >> for > > > > > >> > > > > >>>> the API Gateway to create a JWT, and then for > > that JWT > > > to > > > > > be > > > > > >> > > > accepted > > > > > >> > > > > >> by > > > > > >> > > > > >>>> all Traffic Ops microservices, with no database > > > > > >> authentication. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> It's a common misconception that JWT allows you > > > > > authenticate > > > > > >> > > without > > > > > >> > > > > >>>> hitting the database. This is an exceedingly > > dangerous > > > > > >> > > > misconception. > > > > > >> > > > > >> If > > > > > >> > > > > >>>> you don't check the database when every > > authenticated > > > > route > > > > > >> is > > > > > >> > > > > >> requested, > > > > > >> > > > > >>>> it's impossible to revoke access. In practice, > > this > > > means > > > > > the > > > > > >> > JWT > > > > > >> > > > TTL > > > > > >> > > > > >>>> becomes the length of time _after you discover > an > > > > attacker > > > > > is > > > > > >> > > > > >>> manipulating > > > > > >> > > > > >>>> your production system_, before it's _possible_ > > to > > > evict > > > > > >> them. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> How long do you feel is acceptable to have a > > hacker in > > > > and > > > > > >> > > > > manipulating > > > > > >> > > > > >>>> your system, after you discover them? A day? An > > hour? > > > > Five > > > > > >> > > minutes? > > > > > >> > > > > >>>> Whatever your TTL, that's the length of time > > you're > > > > willing > > > > > >> to > > > > > >> > > > allow a > > > > > >> > > > > >>>> hacker to steal and destroy you and your > > customers' > > > data. > > > > > >> Worse, > > > > > >> > > > > >> because > > > > > >> > > > > >>>> this is a CDN, it's the length of time you're > > willing > > > to > > > > > >> allow > > > > > >> > > your > > > > > >> > > > > CDN > > > > > >> > > > > >>> to > > > > > >> > > > > >>>> be used to DDOS a target. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> Are you going to explain in court that the DDOS > > your > > > > system > > > > > >> > > executed > > > > > >> > > > > >>> lasted > > > > > >> > > > > >>>> 24 hours, or 1 hour, or 10 minutes after you > > discovered > > > > it, > > > > > >> > > because > > > > > >> > > > > >>> that's > > > > > >> > > > > >>>> the TTL you hard-coded? Are you going to > explain > > to a > > > > judge > > > > > >> and > > > > > >> > > > > >>> prosecuting > > > > > >> > > > > >>>> attorney exactly which sensitive data was > stolen > > in the > > > > ten > > > > > >> > > minutes > > > > > >> > > > > >> after > > > > > >> > > > > >>>> you discovered the attacker in your system, > > before > > > their > > > > > JWT > > > > > >> > > > expired? > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> If you're willing to accept the legal > > consequences, > > > > that's > > > > > >> your > > > > > >> > > > > >> business. > > > > > >> > > > > >>>> Apache Traffic Control should not require users > > to > > > accept > > > > > >> those > > > > > >> > > > > >>>> consequences, and ideally shouldn't make it > > possible, > > > as > > > > > many > > > > > >> > > users > > > > > >> > > > > >> won't > > > > > >> > > > > >>>> understand the security risks. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> The argument has been made "authorization does > > not > > > check > > > > > the > > > > > >> > > > database > > > > > >> > > > > >> to > > > > > >> > > > > >>>> avoid congestion" -- Has anyone tested this in > > > practice? > > > > > The > > > > > >> > > > database > > > > > >> > > > > >>> query > > > > > >> > > > > >>>> itself is 50ms. Assuming your database and > > service are > > > > > 2500km > > > > > >> > > apart, > > > > > >> > > > > >>> that's > > > > > >> > > > > >>>> another 50ms network latency. Traffic Ops has > > endpoints > > > > > that > > > > > >> > take > > > > > >> > > > 10s > > > > > >> > > > > >> to > > > > > >> > > > > >>>> generate. Worst-case scenario, this will double > > the > > > time > > > > of > > > > > >> tiny > > > > > >> > > > > >>> endpoints > > > > > >> > > > > >>>> to 200ms, and increase large endpoints > > > inconsequentially. > > > > > >> It's > > > > > >> > > > highly > > > > > >> > > > > >>>> unlikely performance is an issue in practice. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> As Jan said, we can still have the services > > check the > > > > auth > > > > > as > > > > > >> > well > > > > > >> > > > > >> after > > > > > >> > > > > >>>> the proxy auth. Moreover, the services don't > > even have > > > to > > > > > >> know > > > > > >> > > about > > > > > >> > > > > >> the > > > > > >> > > > > >>>> auth service, they can hit a mapped route on > the > > API > > > > > Gateway, > > > > > >> > > which > > > > > >> > > > > >> gives > > > > > >> > > > > >>>> us better modularisation and separation of > > concerns. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> It's not difficult, it can be a trivial > endpoint > > on the > > > > > auth > > > > > >> > > > service, > > > > > >> > > > > >>>> remapped in the API Gateway, which takes the > JWT > > token > > > > and > > > > > >> > returns > > > > > >> > > > > true > > > > > >> > > > > >>> if > > > > > >> > > > > >>>> it's still authorized in the database. To be > > clear, > > > this > > > > is > > > > > >> not > > > > > >> > a > > > > > >> > > > > >> problem > > > > > >> > > > > >>>> today. Traffic Ops still uses the Mojolicious > > cookie > > > > today, > > > > > >> so > > > > > >> > > this > > > > > >> > > > > >> would > > > > > >> > > > > >>>> only need done if and when we remove that, or > if > > we > > > move > > > > > >> > > authorized > > > > > >> > > > > >>>> endpoints out of Traffic Ops into their own > > > > microservices. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> Considering the significant security and legal > > risks, > > > we > > > > > >> should > > > > > >> > > > always > > > > > >> > > > > >>> hit > > > > > >> > > > > >>>> the database to validate requests of authorized > > > > endpoints, > > > > > >> and > > > > > >> > > > > >> reconsider > > > > > >> > > > > >>>> if and when someone observes performance issues > > in > > > > > practice. > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> > > > > > >> > > > > >>>> On Tue, May 9, 2017 at 6:56 AM, Dewayne > > Richardson < > > > > > >> > > > [email protected] > > > > > >> > > > > > > > > > > >> > > > > >>>> wrote: > > > > > >> > > > > >>>> > > > > > >> > > > > >>>>> If only the API GW authenticates/authorizes we > > also > > > > have a > > > > > >> > single > > > > > >> > > > > >> point > > > > > >> > > > > >>>> of > > > > > >> > > > > >>>>> entry to test for security instead of having > it > > > > sprinkled > > > > > >> > across > > > > > >> > > > > >>> services > > > > > >> > > > > >>>>> in different ways. It also simplifies the > code > > on the > > > > > >> service > > > > > >> > > side > > > > > >> > > > > >> and > > > > > >> > > > > >>>>> makes them easier to test with automation. > > > > > >> > > > > >>>>> > > > > > >> > > > > >>>>> -Dew > > > > > >> > > > > >>>>> > > > > > >> > > > > >>>>> On Mon, May 8, 2017 at 8:42 AM, Robert Butts < > > > > > >> > > > > >> [email protected] > > > > > >> > > > > >>>> > > > > > >> > > > > >>>>> wrote: > > > > > >> > > > > >>>>> > > > > > >> > > > > >>>>>>> couldn't make nginx or http do what we need. > > > > > >> > > > > >>>>>> > > > > > >> > > > > >>>>>> I was suggesting a different architecture. > Not > > making > > > > the > > > > > >> > proxy > > > > > >> > > do > > > > > >> > > > > >>>> auth, > > > > > >> > > > > >>>>>> only standard proxying. > > > > > >> > > > > >>>>>> > > > > > >> > > > > >>>>>>> We can still have the services check the > auth > > as > > > well > > > > > >> after > > > > > >> > the > > > > > >> > > > > >>> proxy > > > > > >> > > > > >>>>>> auth > > > > > >> > > > > >>>>>> > > > > > >> > > > > >>>>>> +1 > > > > > >> > > > > >>>>>> > > > > > >> > > > > >>>>>> > > > > > >> > > > > >>>>>> On Mon, May 8, 2017 at 3:36 AM, Amir > Yeshurun < > > > > > >> > [email protected]> > > > > > >> > > > > >>> wrote: > > > > > >> > > > > >>>>>> > > > > > >> > > > > >>>>>>> Hi, > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> Let me elaborate some more on the purpose of > > the API > > > > > GW. I > > > > > >> > will > > > > > >> > > > > >> put > > > > > >> > > > > >>>> up > > > > > >> > > > > >>>>> a > > > > > >> > > > > >>>>>>> wiki page following our discussions here. > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> Main purpose is to allow innovation by > > creating new > > > > > >> services > > > > > >> > > that > > > > > >> > > > > >>>>> handle > > > > > >> > > > > >>>>>> TO > > > > > >> > > > > >>>>>>> functionality, not as a part of the > > monolithic Mojo > > > > app. > > > > > >> > > > > >>>>>>> The long term vision is to de-compose TO > into > > > multiple > > > > > >> > > > > >>> microservices, > > > > > >> > > > > >>>>>>> allowing new functionality easily added. > > > > > >> > > > > >>>>>>> Indeed, the goal it to eventually deprecate > > the > > > > current > > > > > >> AAA > > > > > >> > > > > >> model, > > > > > >> > > > > >>>> and > > > > > >> > > > > >>>>>>> replace it with the new AAA model currently > > under > > > work > > > > > >> > > > > >> (user-roles, > > > > > >> > > > > >>>>>>> role-capabilities) > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> I think that handling authorization in the > > API layer > > > > is > > > > > a > > > > > >> > valid > > > > > >> > > > > >>>>> approach. > > > > > >> > > > > >>>>>>> Security wise, I don't see much difference > > between > > > > that, > > > > > >> and > > > > > >> > > > > >> having > > > > > >> > > > > >>>>> each > > > > > >> > > > > >>>>>>> module access the auth service, as long as > > the auth > > > > > >> service > > > > > >> > is > > > > > >> > > > > >>>> deployed > > > > > >> > > > > >>>>>> in > > > > > >> > > > > >>>>>>> the backend. > > > > > >> > > > > >>>>>>> Having another proxy (nginx?) fronting the > > world and > > > > > >> > forwarding > > > > > >> > > > > >> all > > > > > >> > > > > >>>>>>> requests to the backend GW mitigates the > risk > > for > > > > > >> > compromising > > > > > >> > > > > >> the > > > > > >> > > > > >>>>>>> authorization service. > > > > > >> > > > > >>>>>>> However, as mentioned above, we can still > > have the > > > > > >> services > > > > > >> > > check > > > > > >> > > > > >>> the > > > > > >> > > > > >>>>>> auth > > > > > >> > > > > >>>>>>> as well after the proxy auth. > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> It is a standalone process, completely > > optional at > > > > this > > > > > >> > point. > > > > > >> > > > > >> One > > > > > >> > > > > >>>> can > > > > > >> > > > > >>>>>>> choose to deploy it in order to allow > > integration > > > with > > > > > >> > > additional > > > > > >> > > > > >>>>>>> services. Deployment > > > > > >> > > > > >>>>>>> and management are still T.B.D, and feedback > > on this > > > > is > > > > > >> most > > > > > >> > > > > >>> welcome. > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> Regarding token validation and revocation: > > > > > >> > > > > >>>>>>> Tokens have expiration time. Expired tokens > > do not > > > > pass > > > > > >> token > > > > > >> > > > > >>>>> validation. > > > > > >> > > > > >>>>>>> In production, expiration should be set to > > > relatively > > > > > >> short > > > > > >> > > time, > > > > > >> > > > > >>>> say 5 > > > > > >> > > > > >>>>>>> minute. > > > > > >> > > > > >>>>>>> This way revocation is automatic. > > Re-authentication > > > is > > > > > >> > handled > > > > > >> > > > > >> via > > > > > >> > > > > >>>>>> refresh > > > > > >> > > > > >>>>>>> tokens (not implemented yet). Hitting the DB > > upon > > > > every > > > > > >> API > > > > > >> > > call > > > > > >> > > > > >>>> cause > > > > > >> > > > > >>>>>>> congestion on users DB. > > > > > >> > > > > >>>>>>> To avoid that, we chose to have all user > > information > > > > > >> > > > > >> self-contained > > > > > >> > > > > >>>>>> inside > > > > > >> > > > > >>>>>>> the JWT. > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> Thanks > > > > > >> > > > > >>>>>>> /amiry > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>> On Mon, May 8, 2017 at 5:42 AM Jan van > Doorn < > > > > > >> > [email protected]> > > > > > >> > > > > >>>> wrote: > > > > > >> > > > > >>>>>>> > > > > > >> > > > > >>>>>>>> It's the reverse proxy we've discussed for > > the > > > "micro > > > > > >> > > services" > > > > > >> > > > > >>>>> version > > > > > >> > > > > >>>>>>> for > > > > > >> > > > > >>>>>>>> a while now (as in > > > > > >> > > > > >>>>>>>> > > > > > >> > > > > >>>> https://cwiki.apache.org/ > > confluence/display/TC/Design+ > > > > > >> > > Overview+v3.0 > > > > > >> > > > > >>>>> ). > > > > > >> > > > > >>>>>>>> > > > > > >> > > > > >>>>>>>> On Sun, May 7, 2017 at 7:22 PM Eric > Friedrich > > > > > (efriedri) > > > > > >> < > > > > > >> > > > > >>>>>>>> [email protected]> > > > > > >> > > > > >>>>>>>> wrote: > > > > > >> > > > > >>>>>>>> > > > > > >> > > > > >>>>>>>>> From a higher level- what is purpose of > the > > API > > > > > Gateway? > > > > > >> > It > > > > > >> > > > > >>>> seems > > > > > >> > > > > >>>>>> like > > > > > >> > > > > >>>>>>>>> there may have been some previous > > discussions > > > about > > > > > API > > > > > >> > > > > >>> Gateway. > > > > > >> > > > > >>>>> Are > > > > > >> > > > > >>>>>>>> there > > > > > >> > > > > >>>>>>>>> any notes or description that I can catch > > up on? > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>> How will it be deployed? (Is it a > standalone > > > service > > > > > or > > > > > >> > > > > >>> something > > > > > >> > > > > >>>>>> that > > > > > >> > > > > >>>>>>>>> runs inside the experimental Traffic Ops)? > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>> Is this new component required or > optional? > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>> —Eric > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>>> On May 7, 2017, at 8:28 PM, Jan van > Doorn < > > > > > >> > [email protected] > > > > > >> > > > > >>> > > > > > >> > > > > >>>>> wrote: > > > > > >> > > > > >>>>>>>>>> > > > > > >> > > > > >>>>>>>>>> I looked into this a year or so ago, and > I > > > couldn't > > > > > >> make > > > > > >> > > > > >>> nginx > > > > > >> > > > > >>>> or > > > > > >> > > > > >>>>>>> http > > > > > >> > > > > >>>>>>>> do > > > > > >> > > > > >>>>>>>>>> what we need. > > > > > >> > > > > >>>>>>>>>> > > > > > >> > > > > >>>>>>>>>> We can still have the services check the > > auth as > > > > well > > > > > >> > after > > > > > >> > > > > >>> the > > > > > >> > > > > >>>>>> proxy > > > > > >> > > > > >>>>>>>>> auth, > > > > > >> > > > > >>>>>>>>>> and make things better than today, where > > we have > > > > the > > > > > >> same > > > > > >> > > > > >>>> problem > > > > > >> > > > > >>>>>>> that > > > > > >> > > > > >>>>>>>> if > > > > > >> > > > > >>>>>>>>>> the TO mojo app is compromised, > everything > > is > > > > > >> compromised. > > > > > >> > > > > >>>>>>>>>> > > > > > >> > > > > >>>>>>>>>> If we always route to TO, we don't > > untangle the > > > > mess > > > > > of > > > > > >> > > > > >> being > > > > > >> > > > > >>>>>>> dependent > > > > > >> > > > > >>>>>>>>> on > > > > > >> > > > > >>>>>>>>>> the monolithic TO for everything. Many > > services > > > > > today, > > > > > >> and > > > > > >> > > > > >>> more > > > > > >> > > > > >>>>> in > > > > > >> > > > > >>>>>>> the > > > > > >> > > > > >>>>>>>>>> future really just need a check to see if > > the > > > user > > > > is > > > > > >> > > > > >>>> authorized, > > > > > >> > > > > >>>>>> and > > > > > >> > > > > >>>>>>>>>> nothing more. > > > > > >> > > > > >>>>>>>>>> > > > > > >> > > > > >>>>>>>>>> On Sun, May 7, 2017 at 11:55 AM Robert > > Butts < > > > > > >> > > > > >>>>>>> [email protected] > > > > > >> > > > > >>>>>>>>> > > > > > >> > > > > >>>>>>>>>> wrote: > > > > > >> > > > > >>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>> What are the advantages of these config > > files, > > > > over > > > > > an > > > > > >> > > > > >>>> existing > > > > > >> > > > > >>>>>>>> reverse > > > > > >> > > > > >>>>>>>>>>> proxy, like Nginx or httpd? It's just as > > much > > > work > > > > > as > > > > > >> > > > > >>>>> configuring > > > > > >> > > > > >>>>>>> and > > > > > >> > > > > >>>>>>>>>>> deploying an existing product, but more > > code we > > > > have > > > > > >> to > > > > > >> > > > > >>> write > > > > > >> > > > > >>>>> and > > > > > >> > > > > >>>>>>>>> maintain. > > > > > >> > > > > >>>>>>>>>>> I'm having trouble seeing the advantage. > > > > > >> > > > > >>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>> -1 on auth rules as a part of the proxy. > > Making > > > a > > > > > >> proxy > > > > > >> > > > > >> care > > > > > >> > > > > >>>>> about > > > > > >> > > > > >>>>>>>> auth > > > > > >> > > > > >>>>>>>>>>> violates the Single Responsibility > > Principle, > > > and > > > > > >> > further, > > > > > >> > > > > >>> is > > > > > >> > > > > >>>> a > > > > > >> > > > > >>>>>>>> security > > > > > >> > > > > >>>>>>>>>>> risk. It creates unnecessary attack > > surface. If > > > > your > > > > > >> > proxy > > > > > >> > > > > >>> app > > > > > >> > > > > >>>>> or > > > > > >> > > > > >>>>>>>>> server is > > > > > >> > > > > >>>>>>>>>>> compromised, the entire framework is now > > > > > compromised. > > > > > >> An > > > > > >> > > > > >>>>> attacker > > > > > >> > > > > >>>>>>>> could > > > > > >> > > > > >>>>>>>>>>> simply rewrite the proxy config to make > > all > > > routes > > > > > >> > > > > >> no-auth. > > > > > >> > > > > >>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>> The simple alternative is for the proxy > to > > > always > > > > > >> route > > > > > >> > to > > > > > >> > > > > >>> TO, > > > > > >> > > > > >>>>> and > > > > > >> > > > > >>>>>>> TO > > > > > >> > > > > >>>>>>>>>>> checks the token against the auth > service > > (which > > > > may > > > > > >> also > > > > > >> > > > > >> be > > > > > >> > > > > >>>>>>> proxied), > > > > > >> > > > > >>>>>>>>> and > > > > > >> > > > > >>>>>>>>>>> redirects unauthorized requests to a > login > > > > endpoint > > > > > >> > (which > > > > > >> > > > > >>> may > > > > > >> > > > > >>>>>> also > > > > > >> > > > > >>>>>>> be > > > > > >> > > > > >>>>>>>>>>> proxied). > > > > > >> > > > > >>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>> The TO service (and any other service > that > > > > requires > > > > > >> auth) > > > > > >> > > > > >>> MUST > > > > > >> > > > > >>>>> hit > > > > > >> > > > > >>>>>>> the > > > > > >> > > > > >>>>>>>>>>> database (or the auth service, which > > itself hits > > > > the > > > > > >> > > > > >>> database) > > > > > >> > > > > >>>>> to > > > > > >> > > > > >>>>>>>> verify > > > > > >> > > > > >>>>>>>>>>> valid tokens' users still have the > > permissions > > > > they > > > > > >> did > > > > > >> > > > > >> when > > > > > >> > > > > >>>> the > > > > > >> > > > > >>>>>>> token > > > > > >> > > > > >>>>>>>>> was > > > > > >> > > > > >>>>>>>>>>> created. Otherwise, it's impossible to > > revoke > > > > > tokens, > > > > > >> > e.g. > > > > > >> > > > > >>> if > > > > > >> > > > > >>>> an > > > > > >> > > > > >>>>>>>>> employee > > > > > >> > > > > >>>>>>>>>>> quits, or an attacker gains a token, or > a > > user > > > > > changes > > > > > >> > > > > >> their > > > > > >> > > > > >>>>>>> password. > > > > > >> > > > > >>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>> On Sun, May 7, 2017 at 4:35 AM, Amir > > Yeshurun < > > > > > >> > > > > >>>> [email protected]> > > > > > >> > > > > >>>>>>>> wrote: > > > > > >> > > > > >>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>>> Seems that attachments are stripped on > > this > > > list. > > > > > >> > > > > >> Examples > > > > > >> > > > > >>>>> pasted > > > > > >> > > > > >>>>>>>> below > > > > > >> > > > > >>>>>>>>>>>> > > > > > >> > > > > >>>>>>>>>>>> *rules.json* > > > > > >> > > > > >>>>>>>>>>>> [ > > > > > >> > > > > >>>>>>>>>>>> { "host": "localhost", "path": > > "/login", > > > > > >> > > > > >>>>>>> "forward": > > > > > >> > > > > >>>>>>>>>>>> "localhost:9004", "scheme": "https", > > "auth": > > > > false > > > > > }, > > > > > >> > > > > >>>>>>>>>>>> { "host": "localhost", "path": > > > > > >> "/api/1.2/innovation/", > > > > > >> > > > > >>>>>>> "forward": > > > > > >> > > > > >>>>>>>>>>>> "localhost:8004", "scheme": "http", > > "auth": > > > > true, > > > > >
