[ 
https://issues.apache.org/jira/browse/MAPREDUCE-596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857411#action_12857411
 ] 

Trevor Rundell commented on MAPREDUCE-596:
------------------------------------------

Apparently this issue is still around.  When trying to distribute a .zip file 
with -file I end up with a job jar structure something like this...

Archive:  job_201004151121_0002.jar
  inflating: load_diff.py            
  inflating: getmaps.py              
  inflating: lib/warehouse.zip       
  inflating: envs.cfg
  ...

For some reason, the zip file ends up in the lib/ directory.  When I change the 
extension to .zipp the file ends up in the top level like I'd expect it to.

Archive:  job_201004151121_0004.jar
  inflating: load_diff.py            
  inflating: getmaps.py              
  inflating: warehouse.zipp          
  inflating: envs.cfg 
  ...

Any particular reason for this?

> can't package zip file with hadoop streaming -file argument
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-596
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-596
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: contrib/streaming
>            Reporter: Karl Anderson
>
> I'm unable to ship a file with a .zip suffix to the mapper using the -file 
> argument for hadoop streaming.  I am able to ship it if I change the suffix 
> to .zipp.  Is this a bug, or perhaps has something to do with the jar file 
> format which is used to send files to the instance?
> For example, with this hadoop invocation, and local files "/tmp/boto.zip" and 
> "/tmp/boto.zipp" which are copies of each other:
> $HADOOP_HOME/bin/hadoop jar 
> $HADOOP_HOME/contrib/streaming/hadoop-0.17.0-streaming.jar -mapper 
> $KCLUSTER_SRC/testmapper.py -reducer $KCLUSTER_SRC/testreducer.py -input 
> input/foo -output output -file /tmp/foo.txt -file /tmp/boto.zip -file 
> /tmp/boto.zipp
> I see this line in the invocation standard output:
> packageJobJar: [/tmp/foo.txt, /tmp/boto.zip, /tmp/boto.zipp, 
> /tmp/hadoop-karl/hadoop-unjar6899/] [] /tmp/streamjob6900.jar tmpDir=null
> But in the current directory of the mapper process, "boto.zip" does not 
> exist, while "boto.zipp" does.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to