[ 
https://issues.apache.org/jira/browse/STORM-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327376#comment-16327376
 ] 

Robert Joseph Evans commented on STORM-2898:
--------------------------------------------

[~danny0405],

I have decided to call these worker tokens instead of delegation tokens.  This 
is simply because they are intended only for a specific topology instance to 
use.  Delegation tokens are more generic and can be requested/used by anyone so 
they require APIs to be able to support fetching them and renewing them.  If we 
restrict it to just workers then we don't need to worry about it as much.

So now to answer your questions.

1.  The active nimbus will generate worker tokens when a topology is submitted 
and rotate them periodically, Probably once a day, but I will make it 
configurable.  But it will only do this if the configured transport plugin 
{{storm.thrift.transport}} supports worker tokens.  The worker tokens are only 
for talking back to the nimbus, to supervisors, and to DRPC servers.  All of 
this can currently be done as credentials plugins using 
[INimbusCredentialPlugin|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/INimbusCredentialPlugin.java]
 and 
[ICredentialsRenewer|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/auth/ICredentialsRenewer.java].
  It would be good to be able to reuse the ZK connection for storing the 
secrets so we might either tweak the API a little, or make it a separate thing.

2.  Rotated tokens would be propagated to the working using the credentials 
feature already in place.  Currently that uses zookeeper as a mechanism to get 
them to the workers and the workers get their ZK credentials from their config. 
 But if this works out well we might make it more secure in the future and 
switch to the supervisor getting a worker cred from a privileged ZK location 
and writing it out to a file that only the worker user can read.  It would then 
use that token to fetch all of it's credentials from nimbus directly.  But that 
is future work...

3. The worker when creating a connection to the supervisor would go through the 
configured transport plugin.  Transports that support worker tokens would look 
at the current Subject and at the jaas.conf to decide how to authenticate with 
the server.  If it can find a worker token for the server it wants to talk to 
it would bypass the jaas.conf and just use the DIGEST-MD5 mechanism.  If it 
cannot find a token it would fall back to the jaas.conf and unless the worker 
has done some fancy configuration it will likely fail.

On the server (supervisor) side it would register two difference SASL servers 
with thrift.  As part of the SASL negotiation built into thrift one of them 
will be selected based off of the client's preferences.  In the kerberos case 
it would support both GSSAPI and DIGEST-MD5.  If the DIGEST-MD5 SASL server is 
selected the callback would verify that the username is the properly encoded 
WorkerTokenInfo (a.k.a. username/topology id/secret version number) and that 
the password is the proper signature for that info.  It would look these up in 
zookeeper and store them in a cache so we don't hit ZK too frequently.

4.  I agree that reusing Hadoop looks like a great choice on paper.  They have 
everything we want in it, except they come with a lot of baggage.  The big 
issue for me is that UserGroupInformation relies on Configuration.  
Configuration is a crazy complex piece of software that has both static 
configuration parts and dynamic configuration parts that all play off of each 
other in odd ways.  But because of how UserGroupInformation is written it 
requires you to set some security settings like is security enabled and what 
type of security you want to do in hadoop-site.xml or core-site.xml on the 
classpath. There is no way to do this dynamically at runtime.  This means that 
we would either have to change Hadoop so that this is optional, which I don't 
think is that simple, or we change storm so it starts using the Hadoop 
Configuration which I don't want to do to our users, or we would have to have 
both set properly to make storm work, even if you are not using Hadoop.  All of 
that made me feel that simply using the concepts and not the actual code would 
make this simpler.

Also some of the limitations are different between us and Hadoop.  In the case 
of block tokens, which uses some of the delegation token code, they can hand 
out hundreds of these a second, possibly thousands. We are going to hand out 
these tokens on the order of the number of topologies launched, which is a lot 
smaller.  As such I thought it would be better to have a per-topology secret 
instead.  That way an attacker cannot attempt to guess the secret from lots of 
tokens if a flaw is later found in the hashing algorithm we have selected.  The 
design and, when I write it, the code make it so we could reuse the same secret 
if that frequency changes in the future.

If anyone else can think of an out of the box solution that has a compatible 
license that would work I would love to hear it.

I have thought about a few while doing the initial design, but they all have 
their own drawbacks and I rejected them for various reasons.  If someone else 
can find a way to make it work without those limitation I would love to hear it.

Kerberos:
 1) Fetching Kerberos service tickets using a supplied TGT would be 
interesting, but the user would have to regularly upload a TGT, which is 
exactly the opposite of one of our design goals. 
 2) Having keytabs for all possible users on nimbus is just bad and our 
security team would shoot that down in a heart beat.
 3) Using the nimbus credentials to fetch the service tickets would work, but I 
cannot distinguish one worker or user from another.

[Athenz|http://www.athenz.io/]:
This is Yahoo/Oath home grown tech kind of recently open sourced.  It would fit 
for something like this, but it is far from simple to setup, even more complex 
than kerberos.  We have never asked users to support it before and as such it 
would be a real pain to ask them to do so now just for this.  Also the roles 
are not typically that dynamic so creating a new role on a per topology basis 
might tax the system in ways it was not designed to support.

Perhaps I am not enough of a security expert to know what else is out there.

> Storm should support auth through delegation tokens for workers
> ---------------------------------------------------------------
>
>                 Key: STORM-2898
>                 URL: https://issues.apache.org/jira/browse/STORM-2898
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-client, storm-server
>    Affects Versions: 2.0.0
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>            Priority: Major
>
> There are a lot of cases where it would be great for a worker to be able to 
> communicate directly to nimbus, supervisors, or drpc servers in a secure way 
> out of the box.
> This is currently a pain to make work.  The user has to ship a TGT with their 
> topology, and continually keep it up to date with credentials-push.  They 
> also need a kind of hacked up jaas.conf to grab the TGT from AutoTGT and put 
> it in the place that he client wants it.
> We should just generate a signed data structure (aka delegation token from 
> hadoop) that we can had off to the topologies to use when talking to nimbus, 
> a supervisor, or drpc servers.
> We may want to split up the different services from each other to make an 
> attack against one not hit all of them, but that is something we can think 
> about with the design of this.
> I will try to come up with a design shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to