On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote: > Try changing your command to read > > hadoop-streaming \ > -mapper "/usr/bin/python mapper.py" \ > -file "/home/amcnabb/svn/mrpso/python/mapper.py" \ > -reducer "/usr/bin/python reducer.py" \ > -file "/home/amcnabb/svn/mrpso/python/reducer.py" \ > -input kjv \ > -output kjvout
I'll try this first thing in the morning. > I assume kjv is a file and kjvout is a directory - they should be. Actually, I was doing it the same way as other Hadoop stuff I've done: kjv is a directory in DFS. Does HadoopStreaming do it in a different way from most other Hadoop stuff? In any case, how do I make it take a directory as input if that's what I need? > I also assume /usr/bin/python is the path to python *on the cluster > machines*. Otherwise, you can do > -mapper "python mapper.py" -file /usr/bin/python -file > /home/amcnabb/svn/mrpso/python/mapper.py > I recommend adding -jobconf mapred.job.name="kjv", to make the jobtracker > history more readable. > I didn't know about that option. I'll do that. Thanks for all of the tips. -- Andrew McNabb http://www.mcnabbs.org/andrew/ PGP Fingerprint: 8A17 B57C 6879 1863 DE55 8012 AB4D 6098 8826 6868
signature.asc
Description: Digital signature
