Doing everything in the Active Directory should work as well.. What I said earlier was more from the Yahoo deployment of security. Let us know how it goes.
On Oct 1, 2011, at 9:36 AM, bigbibguy father wrote: > Thanks Devaraj for responding. > > In our case , the LDAP server is the corporate active directory server, which > has the user id and the attributes. > > Cluster nodes contact KDC for getting TGT and service tickets for NN and JT > and keep them until the expiry time (7 days). Cluster nodes contact LDAP > Server for each task. So if I understand correctly, the LDAP traffic from the > cluster nodes (around 1000) will be much more than the Authentication > traffic from cluster nodes. > > Why not use the Active Directory as the KDC for authenticating the service > principals (cluster nodes) also? > > In this way , we do not have to manage a separate KDC and worry about it's > availability and health. > > We also plan to have one Active Directory server at the same datacenter as > the cluster , but outside the cluster firewall so that LDAP queries have a > higher SLA. > > The benefits associated with the local KDC option are below and my analysis > is added for each of the benefit. > > It requires less configuration with Active Directory. - But cluster nodes > need to talk to Active Directory for the user details. So it anyway needs the > configuration with Active Directory > It is comparatively easy to script the creation of many principals and > keytabs. A principal and keytab must be created for every daemon in the > cluster, and in a large cluster this can be extremely onerous to do directly > in Active Directory. - This is a one time job and we may be able to script > this with AD also. > There is no need to involve central Active Directory administrators in order > to get service principals created. - We get to manage the OU containing the > service principals. > It allows for incremental configuration. The Hadoop administrator can > completely configure and verify the functionality the cluster independently > of integrating with Active Directory - Good to have this benefit and this is > not available in the Active Directory only option > It can serve to shield the corporate Active Directory server(s) from the many > machines in a Hadoop cluster all requesting Kerberos tickets simultaneously. > During cluster start-up, Hadoop will effectively be acting as a distributed > denial of service attack on the central Active Directory server, which could > adversely affect the performance of the Active Directory server. - The > service principal authentication traffic is not that frequent and hence these > spikes should not be much of a problem for our highly available Active > Directory. > > > But the drawback for local KDC option is that we need to maintain that > KDC server and make sure its highly available with backup server. > > > > Thanks and Regards, > BBG > > > > > On Sat, Oct 1, 2011 at 8:14 AM, Devaraj Das <d...@hortonworks.com> wrote: > The Cluster KDC should be set up to trust the Active Directory KDC > (cross-realm trust in the kerberos lingo). This handles the cases of user > authentication when a user talks to a server in the cluster directly (e.g., > user->namenode). > The GID and other user attributes are usually stored in ldap. The cluster > nodes are set up to talk to the cluster specific ldap server. > > On Sep 30, 2011, at 7:19 PM, bigbibguy father wrote: > >> We are planning to enable secure Hadoop using Kerberos. >> >> Our users reside in the active directory. We read that there are two options >> to use Kerberos for securing Hadoop. >> >> 1) You run Kerberos on machine local to the cluster and create service >> principals here >> 2) Use Active Directory itself as the kerberos KDC and create service >> principals also in Active Directory. >> >> It seems cloudera and industry in general recommends option1 of running a >> local KDC for authernticating service principals. >> https://ccp.cloudera.com/display/CDHDOC/Integrating+Hadoop+Security+with+Active+Directory >> >> I read that the tasktrackers run tasks as the user who submitted the user. >> In that case , doesn't the TaskTracker nodes need to talk to the Active >> Directory to get the user details like gid etc ? >> >> So does this mean that every node (tasktrackers, job tracker and namenode) >> will be interacting with the Active Directory anyway ? >> >> If so, option 1 doesn't seem to be superior since each node has to talk to >> two kdc's - local kerberos for authenticating service principals, Active >> Directory to get the user details and group information . >> >> Please correct me if I am wrong in my assumptions. >> >> Thanks and Regards, >> >> BBG > >