On Friday 08 August 2008 11:43:50 Rong-en Fan wrote:
> After looking into streaming source, the answer is via environment
> variables. For example, mapred.task.timeout is in
> the mapred_task_timeout environment variable.

Well, another typical way to deal with that is to pass the parameters via 
cmdline.

I personally ended up stuffing all our configuration that is related to the 
environment into a config.ini file, that gets served via http, and I pass 
a -c http://host:port/config.ini parameter to all the jobs.

Configuration related to what I expect the job to do I still keep on the 
cmdline, e.g. the hadoop call looks something like this:

time $HADOOP_HOME/bin/hadoop jar 
$HADOOP_HOME/contrib/streaming/hadoop-*-streaming.jar \
       -mapper "/home/hadoop/bin/llfp --s3fetch -K $AWS_ACCESS_KEY_ID -S 
$AWS_SECRET_ACCESS_KEY --stderr -d vmregistry -d frontpage -d papi2 -d 
gen_dailysites -d fb_memberfind -c $CONFIGURL " \
       -reducer "/home/hadoop/bin/lrp --stderr -c $CONFIGURL" \
       -jobconf mapred.reduce.tasks=22 \
       -input /user/hadoop/run-$JOBNAME-input -output 
/user/hadoop/run-$JOBNAME-output || 
exit 1

In our case the seperate .ini file makes sense because it describes the 
environment (e.g. http service urls, sql database connections, and so on) and 
is being used by other scripts that are not run inside hadoop.

Andreas

>
> On Fri, Aug 8, 2008 at 4:26 PM, Rong-en Fan <[EMAIL PROTECTED]> wrote:
> > I'm using streaming with a mapper written in perl. However, an
> > issue is that I want to pass some arguments via command line.
> > In regular Java mapper, I can access JobConf in Mapper.
> > Is there a way to do this?
> >
> > Thanks,
> > Rong-En Fan


Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to