Re: BatchWriter Auth/Audit Question

Mike Drob Fri, 19 Aug 2016 15:10:31 -0700

I got a bit confused here, so I hopped on IRC and tried to reason this out
with Christopher.


The highlights are thus:
* The fail fast relies on canPerformSystemActions->authenticate->throw
Exception when it is not a valid user.
* If we want to switch the order attributes are checked in the
canAskAboutUser method we can probably do !(( selfusercheck &&
authenticate) || canPerformSystemActions), but I don't expect huge
performance gains on this.
** I just tried this without the new authenticate call and this gained me
somewhere around 3-5%. Probably not worth it since there are cases where
the authentication might be important.
* Removing the authenticateUser call is fine from a correctness perspective
because the user is still authenticated and checked via canWrite in either
update() or applyUpdates(). However, this might open us up to a malicious
user sending carefully crafted thrift RPCs to create a lot of bogus
sessions.

I suspect that the performance gains in my case were based on skipping an
extra ZK round trip, but don't have any evidence for it. This was an
interesting diversion, but I don't think that there is much to follow up on.

Specifically, I haven't tried any of this with Kerberos, which I think will
be a much more relevant case.

Mike

On Fri, Aug 19, 2016 at 3:44 PM, Christopher <[email protected]> wrote:

> Sorry, I meant, that it wouldn't be executed in the write path if you
> switch the order. The two credentials should always be the same in that
> case.
>
> On Fri, Aug 19, 2016 at 4:37 PM Christopher <[email protected]> wrote:
>
> > Correct. That code is not executed in the write path. It should only be
> > executed for the APIs where a user (typically an admin) is explicitly
> > checking another user's permissions/authorizations/etc.
> >
> > On Fri, Aug 19, 2016 at 3:59 PM Mike Drob <[email protected]> wrote:
> >
> >> I tried out what Marc suggested, and after I removed the
> authenticateUser
> >> line from my tservers and my write throughput DOUBLED. This does not
> seem
> >> like a minor detail here.
> >>
> >> In the tserver logs, I still see a bunch of "2016-08-19 12:28:11,621
> >> [Audit   ] INFO : operation: permitted; user: mdrob; client:
> [host:port];
> >> action: authenticate;" which I think corresponds to the authenticate
> >> checks
> >> from canWrite, so it's not like any user is allowed to do just write
> >> random
> >> data.
> >> A well-behaved client would fail while setting up the Connector object.
> A
> >> malicious client that's trying to go directly via RPC would fail at the
> >> canWrite check.
> >>
> >> I missed the part about canPerformSystemActions throwing an exception
> via
> >> authenticate when it fails, so we'd never get to the self-equality
> check.
> >> That's a good point, but I don't think that code is ever executed during
> >> the write path.
> >>
> >>
> >> Mike
> >>
> >>
> >>
> >> On Fri, Aug 19, 2016 at 2:42 PM, Christopher <[email protected]>
> wrote:
> >>
> >> > If you note the comment in the parent class, the implementation of
> >> > canAskAboutUser is relying on the authentication being done in the
> call
> >> to
> >> > canPerformSystemActions. If you reverse the order here, you lose that
> >> > authentication check, and the action will be allowed simply if the
> user
> >> is
> >> > equal to itself. That's no good. We need an authentication check, and
> we
> >> > can't rely on the client code to check it for us.
> >> >
> >> > We could reverse the order, but with an explicit authentication check
> >> when
> >> > the user is asking about itself, but then there's likely going to be a
> >> > duplicate authentication check if that fails, because
> >> > canPerformSystemActions will also need to do an authentication check
> >> > (because it can also be called directly).
> >> >
> >> > We can probably refactor this a bit, so we have an unauthenticated
> >> > "canPerformSystemActions" check which is shared by the authenticated
> >> > version and the reverse-ordered canAskAboutUser implementation, but we
> >> need
> >> > to be careful about reversing the order here and forgetting to do an
> >> > authentication check, or unnecessarily doubling-up on the
> authentication
> >> > check (though, maybe the double-check isn't that bad in the less
> common
> >> > case... depends on impl performance).
> >> >
> >> > On Fri, Aug 19, 2016 at 2:33 PM Marc P. <[email protected]>
> wrote:
> >> >
> >> > > Sorry....by out of band, It could be done once per writer or even in
> >> the
> >> > > connector once...but I think we're in agreement Drob, Mike.
> >> > >
> >> > > On Fri, Aug 19, 2016 at 2:30 PM, Marc P. <[email protected]>
> >> wrote:
> >> > >
> >> > > > Funny, exactly the same thing I mentioned to Josh last night. Are
> >> you
> >> > > > watching us?
> >> > > >
> >> > > >  In something I"m doing now I'm making the thrift calls and
> modeled
> >> it
> >> > > > after what the client code does; however,
> >> > > > once I removed the authenticateUser my throughput increased by 25
> >> per
> >> > > > cent. In my trace table it was by and large the greatest audited
> >> call.
> >> > > >
> >> > > > I think we could change how that works and potentially keep that
> >> out of
> >> > > > band. The fail fast is great if the user doesn't have permissions
> to
> >> > that
> >> > > > table and may save the server some work...but as it is now, I'm
> >> always
> >> > > > authenticated and I'm causing more work for the server. Perhaps we
> >> let
> >> > > the
> >> > > > natural exception that's thrown stop the client ( if it's being a
> >> good
> >> > > > client ). I think the 25 per cent improvement was well worth the
> >> > > > alternative risk.
> >> > > >
> >> > > > On Fri, Aug 19, 2016 at 2:16 PM, Mike Drob <[email protected]>
> >> wrote:
> >> > > >
> >> > > >> Devs,
> >> > > >>
> >> > > >> I was recently running a 1.7.2 cluster under a heavy write
> workload
> >> > and
> >> > > >> noticed a _lot_ of audit messages showing up in the logs. After
> >> some
> >> > > >> digging, I found that one of the causes is the following call
> >> chain:
> >> > > >>
> >> > > >> TabletServerBatchWriter::sendMutationsToTabletServer
> >> > > >> calls TabletServer.ThriftClientHandler::startUpdate
> >> > > >> calls SecurityOperation::authenticateUser <-- always return true
> >> > > >> calls SecurityOperation::canAskAboutUser <-- always return true
> >> > > >> calls AuditedSecurityOperation::canPerformSystemActions <--
> fails;
> >> > logs
> >> > > >> "action: performSystemAction"
> >> > > >> calls AuditedSecurityAction::authenticate <-- succeeds; logs
> >> "action:
> >> > > >> authenticate;"
> >> > > >>
> >> > > >> There are two separate but related problems here, I think, that I
> >> want
> >> > > to
> >> > > >> confirm.
> >> > > >>
> >> > > >> When startUpdate calls security.authenticateUser(credentials,
> >> > > >> credentials);
> >> > > >> there is the comment that this is done to "// Make sure user is
> >> real"
> >> > > >> presumably to fail fast. This makes sense, but does it actually
> >> work?
> >> > In
> >> > > >> authenticateUser and canAskAboutUser we end up comparing the two
> >> > > >> credentials to each other and this will always succeed because
> >> they're
> >> > > the
> >> > > >> same object. I expect the write fails later in the TabletServer
> >> code,
> >> > > >> possibly in the UpdateSession, but I haven't been able to
> untangle
> >> > this
> >> > > >> part yet.
> >> > > >>
> >> > > >> Second, I think it makes sense to switch the order of the OR in
> >> > > >> 'canAskAboutUser.' If this is in the write path, then it seems
> like
> >> > the
> >> > > >> most common case would be users asking about themselves, so we
> >> should
> >> > > >> check
> >> > > >> that before checking if a user is an admin. This also has the
> >> > advantage
> >> > > of
> >> > > >> removing two uninformative audit log lines. We won't have an
> audit
> >> > > message
> >> > > >> that somebody wrote data, but we don't have that currently
> anyway.
> >> > > >>
> >> > > >> I did some visual spot checking and this looks like an issue in
> >> master
> >> > > >> also, but I haven't run that code to find out.
> >> > > >>
> >> > > >> Thoughts?
> >> > > >>
> >> > > >> Mike
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: BatchWriter Auth/Audit Question

Reply via email to