I did run it the way you suggested. But I am running into a slew of ClassNotFoundException¹s for the MapClass. Exporting the CLASSPATH doesn¹t seem to fix it. How do I get around it ?
Thanks Avinash On 5/29/07 1:30 PM, "Doug Cutting" <[EMAIL PROTECTED]> wrote: > Phantom wrote: >> > (1) Set my fs.default.name set to hdfs://<host>:<port> and also specify it >> > in the JobConf configuration. Copy my sample input file into HDFS using >> > "bin/hadoop fd -put" from my local file system. I then need to specify this >> > file to my WordCount sample as input. Should I specify this file with the >> > hdfs:// directive ? >> > >> > (2) Set my fs.default.name set to file://<host>:<port> and also specify it >> > in the JobConf configuration. Just specify the input path to the WordCount >> > sample and everything should work if the path is available to all machines >> > in the cluster ? >> > >> > Which way should I go ? > > Either should work. So should a third option, which is to have your job > input in the non-default filesystem, but there's currently a bug that > prevents that from working. But the above two should work. The second > assumes that the input is available on the same path in the native > filesystem on all nodes. > > When naming files in the default filesystem you do not need to specify > their filesystem, since it is the default, but it is not an error to > specify it. > > The most common mode of distributed operation is (1): use an HDFS > filesytem as your fs.default.name, copy your initial input into that > filesystem with 'bin/hadoop fs -put localPath hdfsPath', then specify > 'hdfsPath' as your job's input. The "hdfs://host:port" is not required > at this point, since it is the default. > > Doug > > > >
