Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change 
notification.

The "IdeasOnLdapConfiguration" page has been changed by SomeOtherAccount.
http://wiki.apache.org/hadoop/IdeasOnLdapConfiguration?action=diff&rev1=3&rev2=4

--------------------------------------------------

  
  and in our LDAP server, we have placed the following objects:
  
- { { {
+ {{{
  hostname=myhost1
  objectclass=node
  domain=example.com
@@ -24, +24 @@

  hostname=myhost2
  objectclass=node
  domain=example.com
- } } }
+ }}}
  
  We can now do an LDAP search with (&(objectclass=node)(hostname=myhost1)) to 
find the 'myhost1' object.  Similarly, we can 
(&(objectless)(domain=example.com)) to find both myhost1 and myhost2 objects.
  
  Let's apply these ideas to Hadoop.  Here are some rough objectclasses that we 
can use for demonstration purposes:
  
+ {{{
  generic properties: hadoopGlobalConfig
  hadoop.tmp.dir: string
  fs.default.name: string
@@ -64, +65 @@

  dfs.http.address: string
  hostname: string
  dfs.name.dir: multi-string
+ }}}
+ 
  
  Let's define a simple grid:
  
+ {{{
  clusterName=red
  objectclass=hadoopGlobalConfig
  hadoop.tmp.dir=/tmp
@@ -94, +98 @@

  mapred.local.dir: /mr1,/mr2,/mr3
  mapred.tasktracker.map.tasks.maximum: 4
  mapred.tasktracker.reduce.tasks.maximum: 4
+ }}}
  
+ Let's say we fire up node1. The local config would say what ldap server, 
necessary creds to talk to the ldap server, etc. It might also say that it is 
part of the red cluster in order to speed up the startup.  From there, it would 
do the following:
+ 
+ Get all the global config for the red cluster:  search scope: 
(&(objectclass=hadoopGlobalConfig)(clusterName=red)).  We now know 
hadoop.tmp.dir, fs.default.name, etc.
+ 
+ Are we a namenode?  (&(objectclass=hadoopNameNode)(hostname=node1)).  Empty.  
Drats!
+ 
+ Are we a datanode?  (&(objectclass=hadoopDataNode)(commonname=node1)).  We 
got an object back!  Grab that info and can now start up the datanode process.
+ 
+ Are we a jobtracker?  (&(objectclass=hadoopJobTracker)(hostname=node1)).  
Empty.  Drats!
+ 
+ Are we a tasktracker? (&(objectclass=hadoopTaskTracker)(hostname=node1)):  We 
got an object back!  Fire up the task tracker with that object's info.
+ 
+ From these base definitions, we can do more complex things:
+ 
+ {{{
+ commonname=simplecomputenode1,cluster=red
+ objectclass=hadoopDataNode,hadoopTaskTracker
+ hostname:  node1,node2,node3
+ dfs.data.dir: /hdfs1, /hdfs2, /hdfs3
+ dfs.datanode.du.reserved: 10
+ mapred.job.tracker: commonname=jobtracker,cluster=red
+ mapred.local.dir: /mr1,/mr2,/mr3
+ mapred.tasktracker.map.tasks.maximum: 4
+ mapred.tasktracker.reduce.tasks.maximum: 4
+ 
+ commonname=simplecomputenode2,cluster=red
+ objectclass=hadoopDataNode,hadoopTaskTracker
+ hostname:  node4,node5,node6
+ dfs.data.dir: /hdfs1, /hdfs2, /hdfs3, /hdfs4
+ dfs.datanode.du.reserved: 10
+ mapred.job.tracker: commonname=jobtracker,cluster=red
+ mapred.local.dir: /mr1,/mr2,/mr3,/mr4
+ mapred.tasktracker.map.tasks.maximum: 8
+ mapred.tasktracker.reduce.tasks.maximum: 4
+ }}}
+ 
+ We can define multiple definitions for the same grid.  This is important when 
you consider that small-medium sized grids are likely to have a mix of nodes.  
For example, some nodes may have 8 cores with four disks and some nodes may 
have 6 cores with eight disks.  If they are part of the same cluster, they will 
need different mapred-site.xml settings in order to maximize the hardware 
purchase.
+ 

Reply via email to