[
https://issues.apache.org/jira/browse/STORM-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16327376#comment-16327376
]
Robert Joseph Evans commented on STORM-2898:
--------------------------------------------
[~danny0405],
I have decided to call these worker tokens instead of delegation tokens. This
is simply because they are intended only for a specific topology instance to
use. Delegation tokens are more generic and can be requested/used by anyone so
they require APIs to be able to support fetching them and renewing them. If we
restrict it to just workers then we don't need to worry about it as much.
So now to answer your questions.
1. The active nimbus will generate worker tokens when a topology is submitted
and rotate them periodically, Probably once a day, but I will make it
configurable. But it will only do this if the configured transport plugin
{{storm.thrift.transport}} supports worker tokens. The worker tokens are only
for talking back to the nimbus, to supervisors, and to DRPC servers. All of
this can currently be done as credentials plugins using
[INimbusCredentialPlugin|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/INimbusCredentialPlugin.java]
and
[ICredentialsRenewer|https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/security/auth/ICredentialsRenewer.java].
It would be good to be able to reuse the ZK connection for storing the
secrets so we might either tweak the API a little, or make it a separate thing.
2. Rotated tokens would be propagated to the working using the credentials
feature already in place. Currently that uses zookeeper as a mechanism to get
them to the workers and the workers get their ZK credentials from their config.
But if this works out well we might make it more secure in the future and
switch to the supervisor getting a worker cred from a privileged ZK location
and writing it out to a file that only the worker user can read. It would then
use that token to fetch all of it's credentials from nimbus directly. But that
is future work...
3. The worker when creating a connection to the supervisor would go through the
configured transport plugin. Transports that support worker tokens would look
at the current Subject and at the jaas.conf to decide how to authenticate with
the server. If it can find a worker token for the server it wants to talk to
it would bypass the jaas.conf and just use the DIGEST-MD5 mechanism. If it
cannot find a token it would fall back to the jaas.conf and unless the worker
has done some fancy configuration it will likely fail.
On the server (supervisor) side it would register two difference SASL servers
with thrift. As part of the SASL negotiation built into thrift one of them
will be selected based off of the client's preferences. In the kerberos case
it would support both GSSAPI and DIGEST-MD5. If the DIGEST-MD5 SASL server is
selected the callback would verify that the username is the properly encoded
WorkerTokenInfo (a.k.a. username/topology id/secret version number) and that
the password is the proper signature for that info. It would look these up in
zookeeper and store them in a cache so we don't hit ZK too frequently.
4. I agree that reusing Hadoop looks like a great choice on paper. They have
everything we want in it, except they come with a lot of baggage. The big
issue for me is that UserGroupInformation relies on Configuration.
Configuration is a crazy complex piece of software that has both static
configuration parts and dynamic configuration parts that all play off of each
other in odd ways. But because of how UserGroupInformation is written it
requires you to set some security settings like is security enabled and what
type of security you want to do in hadoop-site.xml or core-site.xml on the
classpath. There is no way to do this dynamically at runtime. This means that
we would either have to change Hadoop so that this is optional, which I don't
think is that simple, or we change storm so it starts using the Hadoop
Configuration which I don't want to do to our users, or we would have to have
both set properly to make storm work, even if you are not using Hadoop. All of
that made me feel that simply using the concepts and not the actual code would
make this simpler.
Also some of the limitations are different between us and Hadoop. In the case
of block tokens, which uses some of the delegation token code, they can hand
out hundreds of these a second, possibly thousands. We are going to hand out
these tokens on the order of the number of topologies launched, which is a lot
smaller. As such I thought it would be better to have a per-topology secret
instead. That way an attacker cannot attempt to guess the secret from lots of
tokens if a flaw is later found in the hashing algorithm we have selected. The
design and, when I write it, the code make it so we could reuse the same secret
if that frequency changes in the future.
If anyone else can think of an out of the box solution that has a compatible
license that would work I would love to hear it.
I have thought about a few while doing the initial design, but they all have
their own drawbacks and I rejected them for various reasons. If someone else
can find a way to make it work without those limitation I would love to hear it.
Kerberos:
1) Fetching Kerberos service tickets using a supplied TGT would be
interesting, but the user would have to regularly upload a TGT, which is
exactly the opposite of one of our design goals.
2) Having keytabs for all possible users on nimbus is just bad and our
security team would shoot that down in a heart beat.
3) Using the nimbus credentials to fetch the service tickets would work, but I
cannot distinguish one worker or user from another.
[Athenz|http://www.athenz.io/]:
This is Yahoo/Oath home grown tech kind of recently open sourced. It would fit
for something like this, but it is far from simple to setup, even more complex
than kerberos. We have never asked users to support it before and as such it
would be a real pain to ask them to do so now just for this. Also the roles
are not typically that dynamic so creating a new role on a per topology basis
might tax the system in ways it was not designed to support.
Perhaps I am not enough of a security expert to know what else is out there.
> Storm should support auth through delegation tokens for workers
> ---------------------------------------------------------------
>
> Key: STORM-2898
> URL: https://issues.apache.org/jira/browse/STORM-2898
> Project: Apache Storm
> Issue Type: New Feature
> Components: storm-client, storm-server
> Affects Versions: 2.0.0
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
> Priority: Major
>
> There are a lot of cases where it would be great for a worker to be able to
> communicate directly to nimbus, supervisors, or drpc servers in a secure way
> out of the box.
> This is currently a pain to make work. The user has to ship a TGT with their
> topology, and continually keep it up to date with credentials-push. They
> also need a kind of hacked up jaas.conf to grab the TGT from AutoTGT and put
> it in the place that he client wants it.
> We should just generate a signed data structure (aka delegation token from
> hadoop) that we can had off to the topologies to use when talking to nimbus,
> a supervisor, or drpc servers.
> We may want to split up the different services from each other to make an
> attack against one not hit all of them, but that is something we can think
> about with the design of this.
> I will try to come up with a design shortly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)