You may specify multiple -input "<wildcard>" statements. Take care to quote the wildcard part to prevent your local shell from parsing it. You may specify any property you like using -jobconf. Common uses are mapred.map.tasks and mapred.reduce.tasks to override the defaults for number of maps and reduces, but anything is allowed. Another useful argument is -cmdenv <key>=<value> to override environment variables. A common use is to ship a dynamic library and set LD_LIBRARY_PATH to '.', but override any variable your program expects.
Yoram > -----Original Message----- > From: Hairong Kuang [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 17, 2006 7:11 PM > To: [email protected] > Subject: RE: HadoopStreaming > > Hadoop streaming assumes that inputs are files. If kjv is a > directory, you > may use the option "-input kjv/*". > > Hairong > > -----Original Message----- > From: Andrew McNabb [mailto:[EMAIL PROTECTED] > Sent: Tuesday, October 17, 2006 6:22 PM > To: [email protected] > Subject: Re: HadoopStreaming > > On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote: > > Try changing your command to read > > > > hadoop-streaming \ > > -mapper "/usr/bin/python mapper.py" \ > > -file "/home/amcnabb/svn/mrpso/python/mapper.py" \ -reducer > > "/usr/bin/python reducer.py" \ -file > > "/home/amcnabb/svn/mrpso/python/reducer.py" \ -input kjv \ -output > > kjvout > > I'll try this first thing in the morning. > > > I assume kjv is a file and kjvout is a directory - they should be. > > Actually, I was doing it the same way as other Hadoop stuff I've done: > kjv is a directory in DFS. Does HadoopStreaming do it in a > different way > from most other Hadoop stuff? > > In any case, how do I make it take a directory as input if > that's what I > need? > > > I also assume /usr/bin/python is the path to python *on the cluster > > machines*. Otherwise, you can do -mapper "python mapper.py" -file > > /usr/bin/python -file /home/amcnabb/svn/mrpso/python/mapper.py > > > I recommend adding -jobconf mapred.job.name="kjv", to make the > > jobtracker history more readable. > > > > I didn't know about that option. I'll do that. > > Thanks for all of the tips. > > -- > Andrew McNabb > http://www.mcnabbs.org/andrew/ > PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868 > >
