I'm unable to ship a file with a .zip suffix to the mapper using the -
file argument for hadoop streaming. I am able to ship it if I change
the suffix to .zipp. Is this a bug, or perhaps has something to do
with the jar file format which is used to send files to the instance?
For example, with this hadoop invocation, and local files "/tmp/
boto.zip" and "/tmp/boto.zipp" which are copies of each other:
$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/contrib/streaming/
hadoop-0.17.0-streaming.jar -mapper $KCLUSTER_SRC/testmapper.py -
reducer $KCLUSTER_SRC/testreducer.py -input input/foo -output output -
file /tmp/foo.txt -file /tmp/boto.zip -file /tmp/boto.zipp
I see this line in the invocation standard output:
packageJobJar: [/tmp/foo.txt, /tmp/boto.zip, /tmp/boto.zipp, /tmp/
hadoop-karl/hadoop-unjar6899/] [] /tmp/streamjob6900.jar tmpDir=null
But in the current directory of the mapper process, "boto.zip" does
not exist, while "boto.zipp" does.