[ https://issues.apache.org/jira/browse/HADOOP-1853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Milind Bhandarkar updated HADOOP-1853: -------------------------------------- Description: Specifying one -cacheFile option in hadoop streaming works. Specifying more than one, gives a parse error. A patch to fix this and a unit test to test the fix has been attached with this bug. (was: Specifying one -cacheFile option in hadoop streaming works. Specifying more than one, gives a parse error. A patch to fix this and a unit test to test the fix has been attached with this bug. Following is an example of this bug: This works: ----------------------- [hod] (parthas) >> stream -input "/user/parthas/test/tmp.data" -mapper "testcache.py abc" -output "/user/parthas/qc/exp2/filterData/subLab/0" -file "/home/parthas/proj/qc/bin/testcache.py" -cacheFile 'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.qlab.head#abc' -jobconf mapred.map.tasks=1 -jobconf mapred.job.name="SubByLabel-101-0.ulab.aa" -jobconf numReduceTasks=0 additionalConfSpec_:null null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming packageJobJar: [/home/parthas/proj/qc/bin/testcache.py, /export/crawlspace/kryptonite/hod/tmp/hod-1467-tmp/hadoop-unjar56313/] [] /tmp/streamjob56314.jar tmpDir=null 07/07/25 16:51:31 INFO mapred.FileInputFormat: Total input paths to process : 1 07/07/25 16:51:32 INFO streaming.StreamJob: getLocalDirs(): [/export/crawlspace/kryptonite/hod/tmp/hod-1467-tmp/mapred/local] 07/07/25 16:51:32 INFO streaming.StreamJob: Running job: job_0006 07/07/25 16:51:32 INFO streaming.StreamJob: To kill this job, run: 07/07/25 16:51:32 INFO streaming.StreamJob: /export/crawlspace/kryptonite/hadoop/mapred/current/bin/../bin/hadoop job -Dmapred.job.tracker=kry1590:50264 -kill job_0006 07/07/25 16:51:32 INFO streaming.StreamJob: Tracking URL: http://kry1590.inktomisearch.com:56285/jobdetails.jsp?jobid=job_0006 07/07/25 16:51:33 INFO streaming.StreamJob: map 0% reduce 0% 07/07/25 16:51:34 INFO streaming.StreamJob: map 100% reduce 0% 07/07/25 16:51:40 INFO streaming.StreamJob: map 100% reduce 100% 07/07/25 16:51:40 INFO streaming.StreamJob: Job complete: job_0006 07/07/25 16:51:40 INFO streaming.StreamJob: Output: /user/parthas/qc/exp2/filterData/subLab/0 --------------- This does not. ---------------------- [hod] (parthas) >> stream -input "/user/parthas/test/tmp.data" -mapper "testcache.py abc def" -output "/user/parthas/qc/exp2/filterData/subLab/0" -file "/home/parthas/proj/qc/bin/testcache.py" -cacheFile 'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.qlab.head#abc' -cacheFile 'hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.ulab.aa.head#def' -jobconf mapred.map.tasks=1 -jobconf mapred.job.name="SubByLabel-101-0.ulab.aa" -jobconf numReduceTasks=0 07/07/25 16:52:17 ERROR streaming.StreamJob: Unexpected hdfs://kry-nn1.inktomisearch.com:8020/user/parthas/test/101-0.ulab.aa.head#def while processing -input|-output|-mapper|-combiner|-reducer|-file|-dfs|-jt|-additionalconfspec|-inputformat|-outputformat|-partitioner|-numReduceTasks|-inputreader|||-cacheFile|-cacheArchive|-verbose|-info|-debug|-inputtagged|-help Usage: $HADOOP_HOME/bin/hadoop [--config dir] jar \ $HADOOP_HOME/hadoop-streaming.jar [options] Options: -input <path> DFS input file(s) for the Map step -output <path> DFS output directory for the Reduce step -mapper <cmd|JavaClassName> The streaming command to run -combiner <JavaClassName> Combiner has to be a Java class -reducer <cmd|JavaClassName> The streaming command to run -file <file> File/dir to be shipped in the Job jar file -dfs <h:p>|local Optional. Override DFS configuration -jt <h:p>|local Optional. Override JobTracker configuration -additionalconfspec specfile Optional. -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName Optional. -outputformat TextOutputFormat(default)|JavaClassName Optional. -partitioner JavaClassName Optional. -numReduceTasks <num> Optional. -inputreader <spec> Optional. -jobconf <n>=<v> Optional. Add or override a JobConf property -cmdenv <n>=<v> Optional. Pass env.var to streaming commands -cacheFile fileNameURI -cacheArchive fileNameURI -verbose For more details about these options: Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info ) > multiple -cacheFile option in hadoop streaming does not seem to work > --------------------------------------------------------------------- > > Key: HADOOP-1853 > URL: https://issues.apache.org/jira/browse/HADOOP-1853 > Project: Hadoop > Issue Type: Bug > Components: contrib/streaming > Reporter: Prachi Gupta > Attachments: MultipleCachefilesPatch.patch > > > Specifying one -cacheFile option in hadoop streaming works. Specifying more > than one, gives a parse error. A patch to fix this and a unit test to test > the fix has been attached with this bug. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.