Hello, I need to use Hadoop Streaming to run several instances of a single program on different files. Before doing it, I wrote a simple test application as the mapper, which basically outputs the standard input without doing anything useful. So it looks like the following:
---------------------------echo.sh-------------------------- echo "Running mapper, input is $1" ---------------------------echo.sh-------------------------- For the input, I created a single text file input.txt that has number from 1 to 10 on each line, so it goes like: -----------input.txt--------------- 1 2 .. 10 -----------input.txt--------------- I uploaded input.txt on hdfs://stream/ directory and then ran Hadoop Streaming utility as follows: bin/hadoop jar hadoop-0.18.0-streaming.jar \ -input /stream \ -output /trash \ -mapper echo.sh \ -file echo.sh \ -jobconf mapred.reduce.tasks=0 and from what I understood in the streaming tutorial, I expected that each mapper would run an instance of echo.sh with one of the lines in input.txt so I expected to get an output in the form of Running mapper, input is 2 Running mapper, input is 5 ... and so on but I got only two output files, part-00000 and part-00001 that contain the string "Running mapper, input is ". As far as I see, the mappers ran the mapper script echo.sh without the standard input. I basicly followed the tutorial and I'm confused now so could you please tell me what I'm missing here? Thanks in advance, Jim
