I'd go with this:

Put all your files in a zip file, upload it with -file, have your python script 
unpack the zip file
 
cmd = 'unzip bla.zip'
retcode = subprocess.call( cmd ,shell=True )

right before you import modules that are contained in that zip file.


Cheers,

  Erez


--- On Wed, 3/17/10, venkata subbarayudu <[email protected]> wrote:

From: venkata subbarayudu <[email protected]>
Subject: Hadoop streaming command : -file option to pass a directory to  
jobcache
To: [email protected], [email protected]
Date: Wednesday, March 17, 2010, 10:53 PM

Hi All,
       I am new to hadoop and is using Python to write MapReduce tasks. In 
order to execute the streaming command I am using the following command.

bin/hadoop jar hadoop-0.20.0-streaming.jar -mapper pkg2Cls.py -jobconf 
mapred.map.tasks=5 -jobconf mapred.reduce.tasks=0 -input /usr/test/linecount  
-output linecountresults -file pkg2Cls.py -file pkg1Cls.py


which is working fine. But now I want to pass the the entire directory of my 
python files to the "-file option", instead of passing each file using the 
-file option.

how can I do this. 


Thanks for your help in advance.

Subbarayudu Amanchi.




      

Reply via email to