[Lucene-hadoop Wiki] Trivial Update of "QuickStart" by masukomi

Apache Wiki Sun, 19 Aug 2007 01:37:19 -0700

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for 
change notification.


The following page has been changed by masukomi:
http://wiki.apache.org/lucene-hadoop/QuickStart

The comment on the change is:
Added the Fully-distributed operation section, but it's still untested

------------------------------------------------------------------------------
  
  '''Mac Users''' You'll probably need to install something like 
[http://www.sshkeychain.org/ SSHKeychain] or 
[http://www.mothersruin.com/software/SSHChain/ SSHChain] (no idea which is 
better) to be able to ssh to a computer without having to enter the password 
every time. This is due to the fact that ssh-agent was designed for X11 systems 
and OS X isn't an X11 system.
  
- == FINISH ME ==
- Will do. Or, maybe you will...
+ === Bootstrapping ===
+ A new distributed filesystem must be formatted with the following command, 
run on the master node:
  
+ {{{bin/hadoop namenode -format}}}
+ 
+ You should see a quick series of `STARTUP_MSG`s and a `SHUTDOWN_MSG`
+ 
+ 
+ Then start up the Hadoop daemon with 
+ 
+ {{{bin/start-all.sh}}}
+ 
+ It should notify you that it's starting the `namenode`, `datanode`, 
`secondarynamenode`, and `jobtracker`. 
+ 
+ Input files are copied into the distributed filesystem as follows: 
+ {{{bin/hadoop dfs -put <localsrc> <dst>}}}
+ For more details just type `bin/hadoop dfs` with no options.
+ 
+ == Stage 3: Fully-distributed operation ==
+ 
+ Distributed operation is just like the pseudo-distributed operation described 
above, except:
+ 
+  1. Specify hostname or IP address of the master server in the values for 
`fs.default.name` and `mapred.job.tracker` in `conf/hadoop-site.xml`. These are 
specified as `host:port` pairs.
+  2. Specify directories for `dfs.name.dir` and `dfs.data.dir` in 
`conf/hadoop-site.xml`. These are used to hold distributed filesystem data on 
the master node and slave nodes respectively. Note that `dfs.data.dir` may 
contain a space- or comma-separated list of directory names, so that data may 
be stored on multiple devices.
+  3. Specify `mapred.local.dir` in `conf/hadoop-site.xml`. This determines 
where temporary MapReduce data is written. It also may be a list of directories.
+  4. Specify `mapred.map.tasks` and `mapred.reduce.tasks` in 
`conf/mapred-default.xml`. As a rule of thumb, use 10x the number of slave 
processors for `mapred.map.tasks`, and 2x the number of slave processors for 
`mapred.reduce.tasks`.
+  5. List all slave hostnames or IP addresses in your `conf/slaves` file, one 
per line.
+

[Lucene-hadoop Wiki] Trivial Update of "QuickStart" by masukomi

Reply via email to