Correct. That code is not executed in the write path. It should only be executed for the APIs where a user (typically an admin) is explicitly checking another user's permissions/authorizations/etc.
On Fri, Aug 19, 2016 at 3:59 PM Mike Drob <[email protected]> wrote: > I tried out what Marc suggested, and after I removed the authenticateUser > line from my tservers and my write throughput DOUBLED. This does not seem > like a minor detail here. > > In the tserver logs, I still see a bunch of "2016-08-19 12:28:11,621 > [Audit ] INFO : operation: permitted; user: mdrob; client: [host:port]; > action: authenticate;" which I think corresponds to the authenticate checks > from canWrite, so it's not like any user is allowed to do just write random > data. > A well-behaved client would fail while setting up the Connector object. A > malicious client that's trying to go directly via RPC would fail at the > canWrite check. > > I missed the part about canPerformSystemActions throwing an exception via > authenticate when it fails, so we'd never get to the self-equality check. > That's a good point, but I don't think that code is ever executed during > the write path. > > > Mike > > > > On Fri, Aug 19, 2016 at 2:42 PM, Christopher <[email protected]> wrote: > > > If you note the comment in the parent class, the implementation of > > canAskAboutUser is relying on the authentication being done in the call > to > > canPerformSystemActions. If you reverse the order here, you lose that > > authentication check, and the action will be allowed simply if the user > is > > equal to itself. That's no good. We need an authentication check, and we > > can't rely on the client code to check it for us. > > > > We could reverse the order, but with an explicit authentication check > when > > the user is asking about itself, but then there's likely going to be a > > duplicate authentication check if that fails, because > > canPerformSystemActions will also need to do an authentication check > > (because it can also be called directly). > > > > We can probably refactor this a bit, so we have an unauthenticated > > "canPerformSystemActions" check which is shared by the authenticated > > version and the reverse-ordered canAskAboutUser implementation, but we > need > > to be careful about reversing the order here and forgetting to do an > > authentication check, or unnecessarily doubling-up on the authentication > > check (though, maybe the double-check isn't that bad in the less common > > case... depends on impl performance). > > > > On Fri, Aug 19, 2016 at 2:33 PM Marc P. <[email protected]> wrote: > > > > > Sorry....by out of band, It could be done once per writer or even in > the > > > connector once...but I think we're in agreement Drob, Mike. > > > > > > On Fri, Aug 19, 2016 at 2:30 PM, Marc P. <[email protected]> > wrote: > > > > > > > Funny, exactly the same thing I mentioned to Josh last night. Are you > > > > watching us? > > > > > > > > In something I"m doing now I'm making the thrift calls and modeled > it > > > > after what the client code does; however, > > > > once I removed the authenticateUser my throughput increased by 25 per > > > > cent. In my trace table it was by and large the greatest audited > call. > > > > > > > > I think we could change how that works and potentially keep that out > of > > > > band. The fail fast is great if the user doesn't have permissions to > > that > > > > table and may save the server some work...but as it is now, I'm > always > > > > authenticated and I'm causing more work for the server. Perhaps we > let > > > the > > > > natural exception that's thrown stop the client ( if it's being a > good > > > > client ). I think the 25 per cent improvement was well worth the > > > > alternative risk. > > > > > > > > On Fri, Aug 19, 2016 at 2:16 PM, Mike Drob <[email protected]> wrote: > > > > > > > >> Devs, > > > >> > > > >> I was recently running a 1.7.2 cluster under a heavy write workload > > and > > > >> noticed a _lot_ of audit messages showing up in the logs. After some > > > >> digging, I found that one of the causes is the following call chain: > > > >> > > > >> TabletServerBatchWriter::sendMutationsToTabletServer > > > >> calls TabletServer.ThriftClientHandler::startUpdate > > > >> calls SecurityOperation::authenticateUser <-- always return true > > > >> calls SecurityOperation::canAskAboutUser <-- always return true > > > >> calls AuditedSecurityOperation::canPerformSystemActions <-- fails; > > logs > > > >> "action: performSystemAction" > > > >> calls AuditedSecurityAction::authenticate <-- succeeds; logs > "action: > > > >> authenticate;" > > > >> > > > >> There are two separate but related problems here, I think, that I > want > > > to > > > >> confirm. > > > >> > > > >> When startUpdate calls security.authenticateUser(credentials, > > > >> credentials); > > > >> there is the comment that this is done to "// Make sure user is > real" > > > >> presumably to fail fast. This makes sense, but does it actually > work? > > In > > > >> authenticateUser and canAskAboutUser we end up comparing the two > > > >> credentials to each other and this will always succeed because > they're > > > the > > > >> same object. I expect the write fails later in the TabletServer > code, > > > >> possibly in the UpdateSession, but I haven't been able to untangle > > this > > > >> part yet. > > > >> > > > >> Second, I think it makes sense to switch the order of the OR in > > > >> 'canAskAboutUser.' If this is in the write path, then it seems like > > the > > > >> most common case would be users asking about themselves, so we > should > > > >> check > > > >> that before checking if a user is an admin. This also has the > > advantage > > > of > > > >> removing two uninformative audit log lines. We won't have an audit > > > message > > > >> that somebody wrote data, but we don't have that currently anyway. > > > >> > > > >> I did some visual spot checking and this looks like an issue in > master > > > >> also, but I haven't run that code to find out. > > > >> > > > >> Thoughts? > > > >> > > > >> Mike > > > >> > > > > > > > > > > > > > >
