Re: Configuration and Hadoop cluster setup

Doug Cutting Tue, 29 May 2007 13:30:47 -0700

Phantom wrote:

(1) Set my fs.default.name set to hdfs://<host>:<port> and also specify it
in the JobConf configuration. Copy my sample input file into HDFS using
"bin/hadoop fd -put" from my local file system. I then need to specify this
file to my WordCount sample as input. Should I specify this file with the
hdfs:// directive ?


(2) Set my fs.default.name set to file://<host>:<port> and also specify it
in the JobConf configuration. Just specify the input path to the WordCount
sample and everything should work if the path is available to all machines
in the cluster ?

Which way should I go ?

Either should work. So should a third option, which is to have your jobinput in the non-default filesystem, but there's currently a bug thatprevents that from working. But the above two should work. The secondassumes that the input is available on the same path in the nativefilesystem on all nodes.

When naming files in the default filesystem you do not need to specifytheir filesystem, since it is the default, but it is not an error tospecify it.

The most common mode of distributed operation is (1): use an HDFSfilesytem as your fs.default.name, copy your initial input into thatfilesystem with 'bin/hadoop fs -put localPath hdfsPath', then specify'hdfsPath' as your job's input. The "hdfs://host:port" is not requiredat this point, since it is the default.


Doug

Re: Configuration and Hadoop cluster setup

Reply via email to