Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The "IdeasOnLdapConfiguration" page has been changed by SomeOtherAccount. http://wiki.apache.org/hadoop/IdeasOnLdapConfiguration?action=diff&rev1=5&rev2=6 -------------------------------------------------- Are we a tasktracker? (&(objectclass=hadoopTaskTracker)(hostname=node1)): We got an object back! Fire up the task tracker with that object's info. - From these base definitions, we can do more complex things: + Let's do a more complex example. What happens if we have more than one type of compute node? We just have multiple definitions of our compute nodes: {{{ - commonname=simplecomputenode1,cluster=red + commonname=computenode1,cluster=red objectclass=hadoopDataNode,hadoopTaskTracker hostname: node1,node2,node3 dfs.data.dir: /hdfs1, /hdfs2, /hdfs3 @@ -125, +125 @@ mapred.tasktracker.map.tasks.maximum: 4 mapred.tasktracker.reduce.tasks.maximum: 4 - commonname=simplecomputenode2,cluster=red + commonname=computenode2,cluster=red objectclass=hadoopDataNode,hadoopTaskTracker hostname: node4,node5,node6 dfs.data.dir: /hdfs1, /hdfs2, /hdfs3, /hdfs4 @@ -136, +136 @@ mapred.tasktracker.reduce.tasks.maximum: 4 }}} + What if we want more than one job tracker talking to the same HDFS? LDAP makes defining this easy: + + {{{ + commonname=masternn,cluster=red + objectclass=hadoopNameNode + dfs.http.address: http://masternn:50070/ + hostname: masternn + dfs.name.dir: /nn1,/nn2 + + commonname=jt1,cluster=red + mapred.reduce.tasks: 1 + mapred.reduce.slowstart.completed.maps: .55 + mapred.queue.names: big,small + mapred.jobtracker.taskScheduler: capacity + mapred.system.dir: /system/mapred + hostname=jt1 + + commonname=jt2,cluster=red + mapred.reduce.tasks: 1 + mapred.reduce.slowstart.completed.maps: .55 + mapred.queue.names: etl + mapred.jobtracker.taskScheduler: capacity + mapred.system.dir: /system/mapred + hostname=jt2 + + commonname=computenode1,cluster=red + objectclass=hadoopDataNode,hadoopTaskTracker + hostname: node1,node2,node3 + dfs.data.dir: /hdfs1, /hdfs2, /hdfs3 + dfs.datanode.du.reserved: 10 + mapred.job.tracker: commonname=jt1,cluster=red + mapred.local.dir: /mr1,/mr2,/mr3 + mapred.tasktracker.map.tasks.maximum: 4 + mapred.tasktracker.reduce.tasks.maximum: 4 + + commonname=computenode2,cluster=red + objectclass=hadoopDataNode,hadoopTaskTracker + hostname: node4,node5,node6 + dfs.data.dir: /hdfs1, /hdfs2, /hdfs3, /hdfs4 + dfs.datanode.du.reserved: 10 + mapred.job.tracker: commonname=jt2,cluster=red + mapred.local.dir: /mr1,/mr2,/mr3,/mr4 + mapred.tasktracker.map.tasks.maximum: 8 + mapred.tasktracker.reduce.tasks.maximum: 4 + }}} + - We can define multiple definitions for the same grid. This is important when you consider that small-medium sized grids are likely to have a mix of nodes. For example, some nodes may have 8 cores with four disks and some nodes may have 6 cores with eight disks. If they are part of the same cluster, they will need different mapred-site.xml settings in order to maximize the hardware purchase. + This is important when you consider that small-medium sized grids are likely to have a mix of nodes. For example, some nodes may have 8 cores with four disks and some nodes may have 6 cores with eight disks. If they are part of the same cluster, they will need different mapred-site.xml settings in order to maximize the hardware purchase. + ---- From the client side, this is a huge win. We can do things like: {{{ $ hadoop listgrids + red + green }}} - In LDAP terms, this would be fetching the (objectclass=hadoopGlobalConfig) and reporting all clusternames. We could also do: + In LDAP terms, this would be fetching the (objectclass=hadoopGlobalConfig) and reporting all clusternames. + + I can also submit a job without knowing any particulars or having a bunch of config files to manage: + + {{{ + $ hadop job -grid red -jt jt1 -jar .... + }}} + + + Because we have access too all grid definitions, we could also do: {{{ $ hadoop distcp red:/my/dir green:/my/dir
