I got a bit confused here, so I hopped on IRC and tried to reason this out with Christopher.
The highlights are thus: * The fail fast relies on canPerformSystemActions->authenticate->throw Exception when it is not a valid user. * If we want to switch the order attributes are checked in the canAskAboutUser method we can probably do !(( selfusercheck && authenticate) || canPerformSystemActions), but I don't expect huge performance gains on this. ** I just tried this without the new authenticate call and this gained me somewhere around 3-5%. Probably not worth it since there are cases where the authentication might be important. * Removing the authenticateUser call is fine from a correctness perspective because the user is still authenticated and checked via canWrite in either update() or applyUpdates(). However, this might open us up to a malicious user sending carefully crafted thrift RPCs to create a lot of bogus sessions. I suspect that the performance gains in my case were based on skipping an extra ZK round trip, but don't have any evidence for it. This was an interesting diversion, but I don't think that there is much to follow up on. Specifically, I haven't tried any of this with Kerberos, which I think will be a much more relevant case. Mike On Fri, Aug 19, 2016 at 3:44 PM, Christopher <[email protected]> wrote: > Sorry, I meant, that it wouldn't be executed in the write path if you > switch the order. The two credentials should always be the same in that > case. > > On Fri, Aug 19, 2016 at 4:37 PM Christopher <[email protected]> wrote: > > > Correct. That code is not executed in the write path. It should only be > > executed for the APIs where a user (typically an admin) is explicitly > > checking another user's permissions/authorizations/etc. > > > > On Fri, Aug 19, 2016 at 3:59 PM Mike Drob <[email protected]> wrote: > > > >> I tried out what Marc suggested, and after I removed the > authenticateUser > >> line from my tservers and my write throughput DOUBLED. This does not > seem > >> like a minor detail here. > >> > >> In the tserver logs, I still see a bunch of "2016-08-19 12:28:11,621 > >> [Audit ] INFO : operation: permitted; user: mdrob; client: > [host:port]; > >> action: authenticate;" which I think corresponds to the authenticate > >> checks > >> from canWrite, so it's not like any user is allowed to do just write > >> random > >> data. > >> A well-behaved client would fail while setting up the Connector object. > A > >> malicious client that's trying to go directly via RPC would fail at the > >> canWrite check. > >> > >> I missed the part about canPerformSystemActions throwing an exception > via > >> authenticate when it fails, so we'd never get to the self-equality > check. > >> That's a good point, but I don't think that code is ever executed during > >> the write path. > >> > >> > >> Mike > >> > >> > >> > >> On Fri, Aug 19, 2016 at 2:42 PM, Christopher <[email protected]> > wrote: > >> > >> > If you note the comment in the parent class, the implementation of > >> > canAskAboutUser is relying on the authentication being done in the > call > >> to > >> > canPerformSystemActions. If you reverse the order here, you lose that > >> > authentication check, and the action will be allowed simply if the > user > >> is > >> > equal to itself. That's no good. We need an authentication check, and > we > >> > can't rely on the client code to check it for us. > >> > > >> > We could reverse the order, but with an explicit authentication check > >> when > >> > the user is asking about itself, but then there's likely going to be a > >> > duplicate authentication check if that fails, because > >> > canPerformSystemActions will also need to do an authentication check > >> > (because it can also be called directly). > >> > > >> > We can probably refactor this a bit, so we have an unauthenticated > >> > "canPerformSystemActions" check which is shared by the authenticated > >> > version and the reverse-ordered canAskAboutUser implementation, but we > >> need > >> > to be careful about reversing the order here and forgetting to do an > >> > authentication check, or unnecessarily doubling-up on the > authentication > >> > check (though, maybe the double-check isn't that bad in the less > common > >> > case... depends on impl performance). > >> > > >> > On Fri, Aug 19, 2016 at 2:33 PM Marc P. <[email protected]> > wrote: > >> > > >> > > Sorry....by out of band, It could be done once per writer or even in > >> the > >> > > connector once...but I think we're in agreement Drob, Mike. > >> > > > >> > > On Fri, Aug 19, 2016 at 2:30 PM, Marc P. <[email protected]> > >> wrote: > >> > > > >> > > > Funny, exactly the same thing I mentioned to Josh last night. Are > >> you > >> > > > watching us? > >> > > > > >> > > > In something I"m doing now I'm making the thrift calls and > modeled > >> it > >> > > > after what the client code does; however, > >> > > > once I removed the authenticateUser my throughput increased by 25 > >> per > >> > > > cent. In my trace table it was by and large the greatest audited > >> call. > >> > > > > >> > > > I think we could change how that works and potentially keep that > >> out of > >> > > > band. The fail fast is great if the user doesn't have permissions > to > >> > that > >> > > > table and may save the server some work...but as it is now, I'm > >> always > >> > > > authenticated and I'm causing more work for the server. Perhaps we > >> let > >> > > the > >> > > > natural exception that's thrown stop the client ( if it's being a > >> good > >> > > > client ). I think the 25 per cent improvement was well worth the > >> > > > alternative risk. > >> > > > > >> > > > On Fri, Aug 19, 2016 at 2:16 PM, Mike Drob <[email protected]> > >> wrote: > >> > > > > >> > > >> Devs, > >> > > >> > >> > > >> I was recently running a 1.7.2 cluster under a heavy write > workload > >> > and > >> > > >> noticed a _lot_ of audit messages showing up in the logs. After > >> some > >> > > >> digging, I found that one of the causes is the following call > >> chain: > >> > > >> > >> > > >> TabletServerBatchWriter::sendMutationsToTabletServer > >> > > >> calls TabletServer.ThriftClientHandler::startUpdate > >> > > >> calls SecurityOperation::authenticateUser <-- always return true > >> > > >> calls SecurityOperation::canAskAboutUser <-- always return true > >> > > >> calls AuditedSecurityOperation::canPerformSystemActions <-- > fails; > >> > logs > >> > > >> "action: performSystemAction" > >> > > >> calls AuditedSecurityAction::authenticate <-- succeeds; logs > >> "action: > >> > > >> authenticate;" > >> > > >> > >> > > >> There are two separate but related problems here, I think, that I > >> want > >> > > to > >> > > >> confirm. > >> > > >> > >> > > >> When startUpdate calls security.authenticateUser(credentials, > >> > > >> credentials); > >> > > >> there is the comment that this is done to "// Make sure user is > >> real" > >> > > >> presumably to fail fast. This makes sense, but does it actually > >> work? > >> > In > >> > > >> authenticateUser and canAskAboutUser we end up comparing the two > >> > > >> credentials to each other and this will always succeed because > >> they're > >> > > the > >> > > >> same object. I expect the write fails later in the TabletServer > >> code, > >> > > >> possibly in the UpdateSession, but I haven't been able to > untangle > >> > this > >> > > >> part yet. > >> > > >> > >> > > >> Second, I think it makes sense to switch the order of the OR in > >> > > >> 'canAskAboutUser.' If this is in the write path, then it seems > like > >> > the > >> > > >> most common case would be users asking about themselves, so we > >> should > >> > > >> check > >> > > >> that before checking if a user is an admin. This also has the > >> > advantage > >> > > of > >> > > >> removing two uninformative audit log lines. We won't have an > audit > >> > > message > >> > > >> that somebody wrote data, but we don't have that currently > anyway. > >> > > >> > >> > > >> I did some visual spot checking and this looks like an issue in > >> master > >> > > >> also, but I haven't run that code to find out. > >> > > >> > >> > > >> Thoughts? > >> > > >> > >> > > >> Mike > >> > > >> > >> > > > > >> > > > > >> > > > >> > > >> > > >
