What?
I am trying to run a hadoop streaming job. I wrote a simple python
script called mapper.py and tested it by 'cat somefile.txt | python
mapper.py.'
The command:
I tried using all combinations of paths in the following command
*$HSTREAMING -Dmapred.reduce.tasks=0
-Dstream.non.zero.exit.is.failure=true *
* -input /ixml*
* -output /oxml *
* -mapper mapper.py *
* -file scripts/mapper.py*
* -inputreader "StreamXmlRecordReader,begin=channel,end=/channel" *
PS:
I made sure mapper.py has execute permissions.
I tried -mapper '/usr/bin/python mapper.py'
I also tried giving full path of mapper.py
I tried without -file
I tried using couple other streaming jars,
hadoop-0.20.1+169.89-streaming.jar
hadoop-0.20.2+228-streaming.jar
Nothing seems to work!!!
The Error:
*java.io.IOException: Cannot run program "mapper.py": error=2, No such file
or directory*
*
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)........
......
ERROR org.apache.hadoop.streaming.PipeMapRed: configuration exception
*
To verify that streaming works, I tried giving bin/wc program as
mapper and it works!
I understand that -file option includes the file in the jar, how do I
refer it in the command line? As per
http://wiki.apache.org/hadoop/HadoopStreaming , I am following correct
instructions.
*Any help would be really appreciated.*
Regards,
~Viv