Re: HadoopStreaming

Andrew McNabb Tue, 17 Oct 2006 18:22:56 -0700

On Tue, Oct 17, 2006 at 03:52:59PM -0700, Yoram Arnon wrote:
> Try changing your command to read  
> 
> hadoop-streaming \
> -mapper "/usr/bin/python mapper.py" \
> -file "/home/amcnabb/svn/mrpso/python/mapper.py" \
> -reducer "/usr/bin/python reducer.py" \
> -file "/home/amcnabb/svn/mrpso/python/reducer.py"  \
> -input kjv \
> -output kjvout


I'll try this first thing in the morning.

> I assume kjv is a file and kjvout is a directory - they should be.

Actually, I was doing it the same way as other Hadoop stuff I've done:
kjv is a directory in DFS.  Does HadoopStreaming do it in a different
way from most other Hadoop stuff?

In any case, how do I make it take a directory as input if that's what I
need?

> I also assume /usr/bin/python is the path to python *on the cluster
> machines*. Otherwise, you can do
> -mapper "python mapper.py" -file /usr/bin/python -file
> /home/amcnabb/svn/mrpso/python/mapper.py

> I recommend adding -jobconf mapred.job.name="kjv", to make the jobtracker
> history more readable.
> 

I didn't know about that option.  I'll do that.

Thanks for all of the tips.

-- 
Andrew McNabb
http://www.mcnabbs.org/andrew/
PGP Fingerprint: 8A17 B57C 6879 1863 DE55  8012 AB4D 6098 8826 6868

signature.asc
Description: Digital signature

Re: HadoopStreaming

Reply via email to