Hadoop Streaming Combiner Problem

Premal Shah Thu, 04 Aug 2011 13:35:18 -0700

According to the hadoop streaming
docs<http://hadoop.apache.org/common/docs/r0.20.0/streaming.html#Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29>,
there is an inbuilt Aggregate Java class which can work both as a mapper and
a reducer.


Here is the command:
*shell> hadoop jar hadoop-streaming.jar -file mapper.py -mapper mapper.py
-combiner aggregate -reducer NONE -input input_files -output output_path*

Executing this command fails the mapper with this error:
*java.io.IOException: Cannot run program "aggregate": java.io.IOException:
error=2, No such file or directory*

However, if you run this command using aggregate as the reducer and not the
combiner, the job works fine.
*shell> hadoop jar hadoop-streaming.jar -file mapper.py -mapper mapper.py
-reduce aggregate -input input_files -output output_path*

What am I doing wrong? Is aggregate treated as a command and not a
JavaClassName? If yes, how do I use the JavaClassName instead?

-- 
Regards,
Premal Shah.

Hadoop Streaming Combiner Problem

Reply via email to