All -
Last week at Hadoop Summit there was a room dedicated as the summit Design
Lounge.
This was a place where like folks could get together and talk about design
issues with other contributors with a simple flip board and some beanbag chairs.
We used this as an opportunity to bootstrap some discussions within common-dev
for security related topics. I'd like to summarize the security session and
takeaways here for everyone.
This summary and set of takeaways are largely from memory.
Please - anyone that attended - feel free to correct anything that is
inaccurate or omitted.
Pretty well attended - companies represented:
* Yahoo!
* Microsoft
* Hortonworks
* Cloudera
* Intel
* eBay
* Voltage Security
* Flying Penguins
* EMC
* others...
Most folks were pretty engaged throughout the session.
We set expectations as a meet and greet/project kickoff - project being the
emerging security development community.
In order to keep the scope of conversations manageable we tried to keep focused
on authentication and the ideas around SSO and tokens.
We discussed kerberos as:
1. major pain point and barrier to entry for some
2. seemingly perfect for others
a. obviously requiring backward compatibility
It seemed to be consensus that:
1. user authentication should be easily integrated with alternative enterprise
identity solutions
2. that service identity issues should not require thousands of service
identities added to enterprise user repositories
3. that customers should not be forced to install/deploy and manage a KDC for
services - this implies a couple options:
a. alternatives to kerberos for service identities
b. hadoop KDC implementation - ie. ApacheDS?
There was active discussion around:
1. Hadoop SSO server
a. acknowledgement of Hadoop SSO tokens as something that can be
standardized for representing both the identity and authentication event data
as well and access tokens representing a verifiable means for the authenticated
identity to access resources or services
b. a general understanding of Hadoop SSO as being an analogue and
alternative for the kerberos KDC and the related tokens being analogous to TGTs
and service tickets
c. an agreement that there are interesting attributes about the
authentication event that may be useful in cross cluster trust for SSO - such
as a rating of authentication strength and number of factors, etc
d. that existing Hadoop tokens - ie. delegation, job, block access -
will all continue to work and that we are initially looking at alternatives to
the KDC, TGTs and service tickets
2. authentication mechanism discovery by clients - Daryn Sharp has done a bunch
of work around this and our SSO solution may want to consider a similar
mechanism for discovering trusted IDPs and service endpoints
3. backward compatibility - kerberos shops need to just continue to work
4. some insight into where/how folks believe that token based authentication
can be accomplished within existing contracts - SASL/GSSAPI, REST, web ui
5. what the establishment of a cross cutting concern community around security
and what that means in terms of the Apache way - email lists, wiki, Jiras
across projects, etc
6. dependencies, rolling updates, patching and how it related to hadoop
projects versus packaging
7. collaboration road ahead
A number of breakout discussions were had outside of the designated design
lounge session as well.
Takeaways for the immediate road ahead:
1. common-dev may be sufficient to discuss security related topics
a. many developers are already subscribed to it
b. there is not that much traffic there anyway
c. we can discuss a more security focused list if we like
2. we will discuss the establishment of a wiki space for a holistic view of
security model, patterns, approaches, etc
3. we will begin discussion on common-dev in near-term for the following:
a. discuss and agree on the high level moving parts required for our
goals for authentication: SSO service, tokens, token validation handlers,
credential management tools, etc
b. discuss and agree on the natural seams across these moving parts and
agree on collaboration by tackling various pieces in a divide and conquer
approach
c. more than likely - the first piece that will need some immediate
discussion will be the shape and form of the tokens
d. we will follow up or supplement discussions with POC code patches
and/or specs attached to jiras
Overall, design lounge was rather effective for what we wanted to do - which
was to bootstrap discussions and collaboration within the community at large.
As always, no specific decisions have been made during this session and we can
discuss any or all of this within common-dev and on related jiras.
Jiras related to the security development group and these discussions:
Centralized SSO/Token Server https://issues.apache.org/jira/browse/HADOOP-9533
Token based authentication and SSO
https://issues.apache.org/jira/browse/HADOOP-9392
Document/analyze current Hadoop security model
https://issues.apache.org/jira/browse/HADOOP-9621
Improve Hadoop security - Use cases
https://issues.apache.org/jira/browse/HADOOP-9671
thanks,
--larry