All -

The pac4j provider contribution was committed yesterday and we are on track
for our 0.8.0 release. Note that the docs are still being massaged a bit
and will end up in the new 0.8.0 users guide book soon.

In the meantime, I'd like to start a discussion wrt the requirements for
identity assertion functionality in order to have full usecase coverage for
our new authentication/federation mechanisms.

A bit of background first...

Some of the external provider integrations that are enabled by the pac4j
provider:

1. result in a PrimaryPrinicipal that is actually an id rather than a
username that could be used directly within the hadoop cluster.
2. some also allow you to configure the user profile attribute to returned
as the subject - such as SAML (okta). So, we could at least some times have
it be an email address.
3. others result in an actual username as the PrimaryPrincipal
4. It is extremely likely that none of these PrimaryPrincipals won't
actually line up with enterprise username that can be used within the
cluster.

Existing identity assertion providers:

1. pseudo/default identity assertion - we have the ability to use principal
mapping to mapping a numeric id/email or whatever to an acceptable username
for hadoop. However, all users that would access hadoop through a topology
configured for pac4j would need to have their principal mappings defined
within the topology. Not a very scalable or manageable approach. The
topology itself would likely end up being huge and they would need to be
sync'd up across all Knox instances in the deployment.
2. regex identity assertion provider - this provider would be able to take
something like an email address PrimaryPrincipal and extract a username
from that. In some cases, like okta, this may be the proper username for
companies that use okta as a hosted SSO solution. There is no additional
principal mapping capabilities however.

So, questions/options for 0.8.0 release:

Option 1. Is static principal mapping within a topology using the
pseudo/default identity assertion provider sufficient for the first release
that has support for these external providers?

Option 2. Do we need to add principal mapping capabilities to the regex
provider to allow for the extraction of a username AND subsequently mapping
that to another username?

Option 3. Should we create a new identity asserter that does a look up in
LDAP for mapping an id or email address to the username/CN? A more dynamic
assertion provider like this would certainly be better for scalability and
management but at the same time would require a change to LDAP schemas for
things like twitter id. Email address may not require a schema change but
would require the email address from the external provider to match that
within the corporate LDAP.

Option 4. Should we consider a central mapping storage identity assertion
provider that would interrogate some KnoxSSO specific mechanism? We could
look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
address or directly to username. This would require some separate
registration or user sync mechanism to populate this central store and
likely couple the mappings to a particular user store like LDAP in some
way. It will also introduce a new wrinkle or consideration for Knox
upgrades having actual user data to migrate, etc. For the central store we
could consider:
     a. file in HDFS
     b. embedded HBase
     c. Hive
     d. RDBMS
     e. LDAP

Personally, I lean toward the following:

* Option 1 from above for 0.8.0 release introduces the pac4j provider with
static principal mapping using pseudo/default assertion provider and
possibly add support for principal mapping to the regex provider (Option 2)
for additional flexibility.

* Option 3 and/or 4 from above for a follow up release/s when we can
determine the exact design for the central store and user sync/registration
mechanism that would best meet the community needs and be sure to put the
time into the upgrade/migration considerations.

Thoughts?

thanks,

--larry

Reply via email to