Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "IdeasOnLdapConfiguration" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/IdeasOnLdapConfiguration -------------------------------------------------- New page: This is for HADOOP:5670. First, a bit about LDAP (extremely simplified). All objects in LDAP are defined by one or more object classes. These object classes define the attributes that a given object can utilize. The attributes in turn have definitions that determine what kinds of values can be used. This includes things as string, integer, etc, but also whether or not the attribute can hold more than one value. Object classes, attributes, and values are all defined in a schema definition. One key feature around LDAP is the ability to search objects using a simple, RPN-style system. Let's say we have an object class that has this definition: objectclass: node hostname: string domain: string and in our LDAP server, we have placed the following objects: hostname=myhost1 objectclass=node domain=example.com hostname=myhost2 objectclass=node domain=example.com We can now do an LDAP search with (&(objectclass=node)(hostname=myhost1)) to find the 'myhost1' object. Similarly, we can (&(objectless)(domain=example.com)) to find both myhost1 and myhost2 objects. Let's apply these ideas to Hadoop. Here are some rough objectclasses that we can use for demonstration purposes: generic properties: hadoopGlobalConfig hadoop.tmp.dir: string fs.default.name: string dfs.block.size: integer dfs.replication: integer clusterName: string For datanodes: hadoopDataNode hostname: multi-string dfs.data.dir: multi-string dfs.datanode.du.reserved: integer commonname: string hadoopTaskTracker commonname: string hostname: multi-string mapred.job.tracker: string mapred.local.dir: multi-string mapred.tasktracker.map.tasks.maximum: integer mapred.tasktracker.reduce.tasks.maximum: integer hadoopJobTracker hostname: mapred.reduce.tasks: integer mapred.reduce.slowstart.completed.maps: numeric mapred.queue.names: multi-string mapred.jobtracker.taskScheduler: string mapred.system.dir: string For the namenode: hadoopNameNode commonname: string dfs.http.address: string hostname: string dfs.name.dir: multi-string Let's define a simple grid: clusterName=red objectclass=hadoopGlobalConfig hadoop.tmp.dir=/tmp fs.default.name: hdfs://namenode:9000/ dfs.block.size: 128 dfs.replication: 3 commonname=master,cluster=red objectclass=hadoopNameNode,hadoopJobTracker dfs.http.address: http://masternode:50070/ hostname: masternode dfs.name.dir: /nn1,/nn2 mapred.reduce.tasks: 1 mapred.reduce.slowstart.completed.maps: .55 mapred.queue.names: big,small mapred.jobtracker.taskScheduler: capacity mapred.system.dir: /system/mapred commonname=simplecomputenode,cluster=red objectclass=hadoopDataNode,hadoopTaskTracker hostname: node1,node2,node3 dfs.data.dir: /hdfs1, /hdfs2, /hdfs3 dfs.datanode.du.reserved: 10 mapred.job.tracker: commonname=jobtracker,cluster=red mapred.local.dir: /mr1,/mr2,/mr3 mapred.tasktracker.map.tasks.maximum: 4 mapred.tasktracker.reduce.tasks.maximum: 4
