I'd go with this: Put all your files in a zip file, upload it with -file, have your python script unpack the zip file cmd = 'unzip bla.zip' retcode = subprocess.call( cmd ,shell=True )
right before you import modules that are contained in that zip file. Cheers, Erez --- On Wed, 3/17/10, venkata subbarayudu <[email protected]> wrote: From: venkata subbarayudu <[email protected]> Subject: Hadoop streaming command : -file option to pass a directory to jobcache To: [email protected], [email protected] Date: Wednesday, March 17, 2010, 10:53 PM Hi All, I am new to hadoop and is using Python to write MapReduce tasks. In order to execute the streaming command I am using the following command. bin/hadoop jar hadoop-0.20.0-streaming.jar -mapper pkg2Cls.py -jobconf mapred.map.tasks=5 -jobconf mapred.reduce.tasks=0 -input /usr/test/linecount -output linecountresults -file pkg2Cls.py -file pkg1Cls.py which is working fine. But now I want to pass the the entire directory of my python files to the "-file option", instead of passing each file using the -file option. how can I do this. Thanks for your help in advance. Subbarayudu Amanchi.
