Hi Amareshwari, Thanks very much for the reply. It's working as you specified. It's unfortunate that the online documentation has to wait for the next release to be updated to follow the current release behavior.
-Michael On Feb 21, 2010, at 7:43 PM, Amareshwari Sri Ramadasu wrote: > Hi Michael, > > There is bug with passing symlink name for -files and -archives options . See > MAPREDUCE-787. > If you don't pass any symlink name for the uri in -files and -archives, it > creates a symlink with actual name. > So, if you pass -archives > "hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar", a symlink > with name cachedir.jar will be created. > > -files and -archives are Generic options. For all commands, generic options > should be followed by command options. > The above documentation is corrected in MAPREDUCE-813. > > Thanks > Amareshwari > > > On 2/20/10 9:57 AM, "Michael Kintzer" <[email protected]> wrote: > >> >> Hi, >> >> Hadoop/HDFS newbie. Been struggling with getting the streaming example >> working with -archives. c.f. >> http://hadoop.apache.org/common/docs/r0.20.1/streaming.html#Large+files+and+archives+in+Hadoop+Streaming >> >> My environment is the Pseudo-distributed environment setup per: >> http://hadoop.apache.org/common/docs/current/quickstart.html#PseudoDistributed >> >> I've run into a couple issues. First issue is "FileNotFoundException" when >> the #symlink suffix is specified with the -archives or -files options as per >> the tutorial. >> >> hadoop jar $HADOOP_HOME/hadoop-0.20.1-streaming.jar -archives >> "hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar#testlink" >> -input "samples/cachefile/input.txt" -mapper "xargs cat" -reducer "cat" >> -output "samples/cachefile/out" >> java.io.FileNotFoundException: File >> hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar#testlink does >> not exist. >> at >> org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:349) >> at >> org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:275) >> at >> org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:375) >> at >> org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153) >> at >> org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at >> org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> >> If I remove the "#testlink" from the archives definition, the error goes >> away but the symlink is not created, as per the tutorial documentation. >> >> I've seen this JIRA issue http://issues.apache.org/jira/browse/HADOOP-6178, >> shows no FIX version, but the Issue Links to others which are supposedly >> fixed in 0.20.1 which I have. >> >> 2nd issue is "Unrecognized option -archives" when -archives is specified at >> the end of the arg list. >> >> hadoop jar $HADOOP_HOME/hadoop/hadoop-0.20.1-streaming.jar -input >> "samples/cachefile/input.txt" -mapper "xargs cat" -reducer "cat" -output >> "samples/cachefile/out9" -archives >> "hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar#testlink" >> 10/02/19 14:29:11 ERROR streaming.StreamJob: Unrecognized option: -archives >> >> Any help getting past this appreciated. Am I missing a configuration >> setting that allows symlinking? Really hoping to use the archives feature. >> >> -Michael > >
