Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "IdeasOnLdapConfiguration" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/IdeasOnLdapConfiguration?action=diff&rev1=3&rev2=4 -------------------------------------------------- and in our LDAP server, we have placed the following objects: - { { { + {{{ hostname=myhost1 objectclass=node domain=example.com @@ -24, +24 @@ hostname=myhost2 objectclass=node domain=example.com - } } } + }}} We can now do an LDAP search with (&(objectclass=node)(hostname=myhost1)) to find the 'myhost1' object. Similarly, we can (&(objectless)(domain=example.com)) to find both myhost1 and myhost2 objects. Let's apply these ideas to Hadoop. Here are some rough objectclasses that we can use for demonstration purposes: + {{{ generic properties: hadoopGlobalConfig hadoop.tmp.dir: string fs.default.name: string @@ -64, +65 @@ dfs.http.address: string hostname: string dfs.name.dir: multi-string + }}} + Let's define a simple grid: + {{{ clusterName=red objectclass=hadoopGlobalConfig hadoop.tmp.dir=/tmp @@ -94, +98 @@ mapred.local.dir: /mr1,/mr2,/mr3 mapred.tasktracker.map.tasks.maximum: 4 mapred.tasktracker.reduce.tasks.maximum: 4 + }}} + Let's say we fire up node1. The local config would say what ldap server, necessary creds to talk to the ldap server, etc. It might also say that it is part of the red cluster in order to speed up the startup. From there, it would do the following: + + Get all the global config for the red cluster: search scope: (&(objectclass=hadoopGlobalConfig)(clusterName=red)). We now know hadoop.tmp.dir, fs.default.name, etc. + + Are we a namenode? (&(objectclass=hadoopNameNode)(hostname=node1)). Empty. Drats! + + Are we a datanode? (&(objectclass=hadoopDataNode)(commonname=node1)). We got an object back! Grab that info and can now start up the datanode process. + + Are we a jobtracker? (&(objectclass=hadoopJobTracker)(hostname=node1)). Empty. Drats! + + Are we a tasktracker? (&(objectclass=hadoopTaskTracker)(hostname=node1)): We got an object back! Fire up the task tracker with that object's info. + + From these base definitions, we can do more complex things: + + {{{ + commonname=simplecomputenode1,cluster=red + objectclass=hadoopDataNode,hadoopTaskTracker + hostname: node1,node2,node3 + dfs.data.dir: /hdfs1, /hdfs2, /hdfs3 + dfs.datanode.du.reserved: 10 + mapred.job.tracker: commonname=jobtracker,cluster=red + mapred.local.dir: /mr1,/mr2,/mr3 + mapred.tasktracker.map.tasks.maximum: 4 + mapred.tasktracker.reduce.tasks.maximum: 4 + + commonname=simplecomputenode2,cluster=red + objectclass=hadoopDataNode,hadoopTaskTracker + hostname: node4,node5,node6 + dfs.data.dir: /hdfs1, /hdfs2, /hdfs3, /hdfs4 + dfs.datanode.du.reserved: 10 + mapred.job.tracker: commonname=jobtracker,cluster=red + mapred.local.dir: /mr1,/mr2,/mr3,/mr4 + mapred.tasktracker.map.tasks.maximum: 8 + mapred.tasktracker.reduce.tasks.maximum: 4 + }}} + + We can define multiple definitions for the same grid. This is important when you consider that small-medium sized grids are likely to have a mix of nodes. For example, some nodes may have 8 cores with four disks and some nodes may have 6 cores with eight disks. If they are part of the same cluster, they will need different mapred-site.xml settings in order to maximize the hardware purchase. +
