WRT Option 2 below: The regex identity assertion mapper can already do what is
described. Given the configuration below it will turn
[email protected] into somebody_USA. The {[2]} takes the value from the
second matching group and looks it up in the lookup table.
<provider>
<role>identity-assertion</role>
<name>Regex</name>
<enabled>true</enabled>
<param>
<name>input</name>
<value>(.*)@(.*?)\..*</value>
</param>
<param>
<name>output</name>
<value>{1}_{[2]}</value>
</param>
<param>
<name>lookup</name>
<value>us=USA;ca=CANADA</value>
</param>
</provider>
On 1/16/16, 11:10 AM, "larry mccay" <[email protected]> wrote:
>All -
>
>The pac4j provider contribution was committed yesterday and we are on track
>for our 0.8.0 release. Note that the docs are still being massaged a bit
>and will end up in the new 0.8.0 users guide book soon.
>
>In the meantime, I'd like to start a discussion wrt the requirements for
>identity assertion functionality in order to have full usecase coverage for
>our new authentication/federation mechanisms.
>
>A bit of background first...
>
>Some of the external provider integrations that are enabled by the pac4j
>provider:
>
>1. result in a PrimaryPrinicipal that is actually an id rather than a
>username that could be used directly within the hadoop cluster.
>2. some also allow you to configure the user profile attribute to returned
>as the subject - such as SAML (okta). So, we could at least some times have
>it be an email address.
>3. others result in an actual username as the PrimaryPrincipal
>4. It is extremely likely that none of these PrimaryPrincipals won't
>actually line up with enterprise username that can be used within the
>cluster.
>
>Existing identity assertion providers:
>
>1. pseudo/default identity assertion - we have the ability to use principal
>mapping to mapping a numeric id/email or whatever to an acceptable username
>for hadoop. However, all users that would access hadoop through a topology
>configured for pac4j would need to have their principal mappings defined
>within the topology. Not a very scalable or manageable approach. The
>topology itself would likely end up being huge and they would need to be
>sync'd up across all Knox instances in the deployment.
>2. regex identity assertion provider - this provider would be able to take
>something like an email address PrimaryPrincipal and extract a username
>from that. In some cases, like okta, this may be the proper username for
>companies that use okta as a hosted SSO solution. There is no additional
>principal mapping capabilities however.
>
>So, questions/options for 0.8.0 release:
>
>Option 1. Is static principal mapping within a topology using the
>pseudo/default identity assertion provider sufficient for the first release
>that has support for these external providers?
>
>Option 2. Do we need to add principal mapping capabilities to the regex
>provider to allow for the extraction of a username AND subsequently mapping
>that to another username?
>
>Option 3. Should we create a new identity asserter that does a look up in
>LDAP for mapping an id or email address to the username/CN? A more dynamic
>assertion provider like this would certainly be better for scalability and
>management but at the same time would require a change to LDAP schemas for
>things like twitter id. Email address may not require a schema change but
>would require the email address from the external provider to match that
>within the corporate LDAP.
>
>Option 4. Should we consider a central mapping storage identity assertion
>provider that would interrogate some KnoxSSO specific mechanism? We could
>look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
>address or directly to username. This would require some separate
>registration or user sync mechanism to populate this central store and
>likely couple the mappings to a particular user store like LDAP in some
>way. It will also introduce a new wrinkle or consideration for Knox
>upgrades having actual user data to migrate, etc. For the central store we
>could consider:
> a. file in HDFS
> b. embedded HBase
> c. Hive
> d. RDBMS
> e. LDAP
>
>Personally, I lean toward the following:
>
>* Option 1 from above for 0.8.0 release introduces the pac4j provider with
>static principal mapping using pseudo/default assertion provider and
>possibly add support for principal mapping to the regex provider (Option 2)
>for additional flexibility.
>
>* Option 3 and/or 4 from above for a follow up release/s when we can
>determine the exact design for the central store and user sync/registration
>mechanism that would best meet the community needs and be sure to put the
>time into the upgrade/migration considerations.
>
>Thoughts?
>
>thanks,
>
>--larry