Doing everything in the Active Directory should work as well.. What I said 
earlier was more from the Yahoo deployment of security. Let us know how it goes.

On Oct 1, 2011, at 9:36 AM, bigbibguy father wrote:

> Thanks Devaraj for responding.
> 
> In our case , the LDAP server is the corporate active directory server, which 
> has the user id and the attributes.
> 
> Cluster nodes contact KDC for getting TGT and service tickets for NN and JT 
> and keep them until the expiry time (7 days). Cluster nodes contact LDAP 
> Server for each task. So if I understand correctly, the LDAP traffic from the 
> cluster nodes (around 1000)  will be much more than the Authentication 
> traffic from cluster nodes.  
> 
> Why not use the Active Directory as the KDC for authenticating the service 
> principals (cluster nodes)  also?
> 
> In this way , we do not have to manage a separate KDC and worry about it's 
> availability and health.
>  
> We also plan to have one Active Directory server at the same datacenter as 
> the cluster , but outside the cluster firewall so that LDAP queries have a 
> higher SLA.
> 
> The benefits associated with the local KDC option are below  and my analysis 
> is added for each of the benefit.
> 
> It requires less configuration with Active Directory.  - But cluster nodes 
> need to talk to Active Directory for the user details. So it anyway needs the 
> configuration with Active Directory 
> It is comparatively easy to script the creation of many principals and 
> keytabs. A principal and keytab must be created for every daemon in the 
> cluster, and in a large cluster this can be extremely onerous to do directly 
> in Active Directory.  - This is a one time job and we may be able to script 
> this with AD also.
> There is no need to involve central Active Directory administrators in order 
> to get service principals created. - We get to manage the OU containing the 
> service principals.
> It allows for incremental configuration. The Hadoop administrator can 
> completely configure and verify the functionality the cluster independently 
> of integrating with Active Directory - Good to have this benefit and this is 
> not available in the Active Directory only option
> It can serve to shield the corporate Active Directory server(s) from the many 
> machines in a Hadoop cluster all requesting Kerberos tickets simultaneously. 
> During cluster start-up, Hadoop will effectively be acting as a distributed 
> denial of service attack on the central Active Directory server, which could 
> adversely affect the performance of the Active Directory server. - The 
> service principal authentication traffic is not that frequent and hence these 
> spikes should not be much of a problem for our highly available Active 
> Directory. 
> 
> 
>       But the drawback for local KDC option is that we need to maintain that 
> KDC server and make sure its highly available with backup server. 
> 
> 
> 
> Thanks and Regards,
> BBG
> 
> 
> 
> 
> On Sat, Oct 1, 2011 at 8:14 AM, Devaraj Das <d...@hortonworks.com> wrote:
> The Cluster KDC should be set up to trust the Active Directory KDC 
> (cross-realm trust in the kerberos lingo). This handles the cases of user 
> authentication when a user talks to a server in the cluster directly (e.g., 
> user->namenode). 
> The GID and other user attributes are usually stored in ldap. The cluster 
> nodes are set up to talk to the cluster specific ldap server. 
> 
> On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote:
> 
>> We are planning to enable secure Hadoop using Kerberos. 
>> 
>> Our users reside in the active directory. We read that there are two options 
>>  to use Kerberos for securing Hadoop.
>> 
>> 1) You run Kerberos on machine local to the cluster and create service 
>> principals here
>> 2) Use Active Directory itself as the kerberos KDC and create service 
>> principals also in Active Directory.
>> 
>> It seems cloudera and industry in general recommends option1 of running a 
>> local KDC for authernticating service principals.
>> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory
>> 
>>  I read that the tasktrackers run tasks as the user who submitted the user. 
>> In that case , doesn't the TaskTracker nodes need to talk to the Active 
>> Directory to get the user details like gid etc ?
>> 
>> So does this mean that every node (tasktrackers, job tracker and namenode)  
>> will be interacting with the Active Directory anyway ?
>> 
>> If so, option 1 doesn't seem to be superior since each node has to talk to 
>> two kdc's - local kerberos for authenticating service principals, Active 
>> Directory to get the user details and group information . 
>> 
>> Please correct me if I am wrong in my assumptions.
>> 
>> Thanks and Regards,
>> 
>> BBG
> 
> 

Reply via email to