Hi Amareshwari,
Thanks for your quick reply, I am not sure (don't know) whether
"-archives" option can be used with the streaming command using the jar
option. Can you please give the command on how it can be used, as in
bin/hadoop jar hadoop-0.20.0-streaming.jar -archives /user/test/src.har
-mapper pkg2Cls.py -jobconf mapred.map.tasks=5 -jobconf
mapred.reduce.tasks=0 -input /usr/test/linecount -output linecountresults
-file pkg2Cls.py -file pkg1Cls.py
On Thu, Mar 18, 2010 at 1:54 PM, Amareshwari Sri Ramadasu <
[email protected]> wrote:
> You can archive/zip the directory and pass it.
> You might have to unarchive it yourself if you use –file option. You can
> use –archives option which will unarchive it for you.
> Please see
> http://hadoop.apache.org/common/docs/r0.20.0/commands_manual.html#Generic+Optionsfor
> more details.
>
> -Amareshwari
>
>
> On 3/18/10 11:23 AM, "venkata subbarayudu" <[email protected]> wrote:
>
> Hi All,
> I am new to hadoop and is using Python to write MapReduce tasks. In
> order to execute the streaming command I am using the following command.
>
> bin/hadoop jar hadoop-0.20.0-streaming.jar -mapper pkg2Cls.py -jobconf
> mapred.map.tasks=5 -jobconf mapred.reduce.tasks=0 -input
> /usr/test/linecount -output linecountresults -file pkg2Cls.py -file
> pkg1Cls.py
>
> which is working fine. But now I want to pass the the entire directory of
> my python files to the "-file option", instead of passing each file using
> the -file option.
>
> how can I do this.
>
>
> Thanks for your help in advance.
> Subbarayudu Amanchi.
>
>