[ https://issues.apache.org/jira/browse/MAPREDUCE-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857411#action_12857411 ]
Trevor Rundell commented on MAPREDUCE-596: ------------------------------------------ Apparently this issue is still around. When trying to distribute a .zip file with -file I end up with a job jar structure something like this... Archive: job_201004151121_0002.jar inflating: load_diff.py inflating: getmaps.py inflating: lib/warehouse.zip inflating: envs.cfg ... For some reason, the zip file ends up in the lib/ directory. When I change the extension to .zipp the file ends up in the top level like I'd expect it to. Archive: job_201004151121_0004.jar inflating: load_diff.py inflating: getmaps.py inflating: warehouse.zipp inflating: envs.cfg ... Any particular reason for this? > can't package zip file with hadoop streaming -file argument > ----------------------------------------------------------- > > Key: MAPREDUCE-596 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-596 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: contrib/streaming > Reporter: Karl Anderson > > I'm unable to ship a file with a .zip suffix to the mapper using the -file > argument for hadoop streaming. I am able to ship it if I change the suffix > to .zipp. Is this a bug, or perhaps has something to do with the jar file > format which is used to send files to the instance? > For example, with this hadoop invocation, and local files "/tmp/boto.zip" and > "/tmp/boto.zipp" which are copies of each other: > $HADOOP_HOME/bin/hadoop jar > $HADOOP_HOME/contrib/streaming/hadoop-0.17.0-streaming.jar -mapper > $KCLUSTER_SRC/testmapper.py -reducer $KCLUSTER_SRC/testreducer.py -input > input/foo -output output -file /tmp/foo.txt -file /tmp/boto.zip -file > /tmp/boto.zipp > I see this line in the invocation standard output: > packageJobJar: [/tmp/foo.txt, /tmp/boto.zip, /tmp/boto.zipp, > /tmp/hadoop-karl/hadoop-unjar6899/] [] /tmp/streamjob6900.jar tmpDir=null > But in the current directory of the mapper process, "boto.zip" does not > exist, while "boto.zipp" does. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira