Thank you very much, Paco and Jason. It works!

For any users who may be curious what this may look like in code, here is a
small snippet of mine:

file: myLittleMRProgram.java
package.org.apache.hadoop.examples;

  public static class Reduce extends MapReduceBase implements Reducer<Text,
LongWritable, Text, LongWritable> {
        private int nTax = 0;

        public void configure(JobConf job) {
            super.configure(job);
            String Tax = job.get("nTax");
            nTax = Integer.parseInt(Tax);
        }

        public void reduce() throws IOException {
          ....
           System.out.println("nTax is: " + nTax);
        }
....
main() {
....
conf.set("nTax", other_args.get(2));
JobClient.runJob(conf);
....
return 0;
}
--------


-SM

On Tue, Aug 19, 2008 at 5:02 PM, Jason Venner <[EMAIL PROTECTED]> wrote:

> Since the map & reduce tasks generally run in a separate java virtual
> machine and on distinct machines from your main task's java virtual machine,
> there is no sharing of variables between the main task and the map or reduce
> tasks.
>
> The standard way is to store the variable in the Configuration (or JobConf)
> object in your main task
> Then in the configure method of your map and reduce task class, extract the
> variable value from the JobConf object.
>
> You will need to implement an overriding to the configure method in your
> map and reduce classes.
>
> This will also require that the variable value be serializable.
>
> For lots of large variables this can be expensive.
>
>
> Sandy wrote:
>
>> Hello,
>>
>>
>> My M/R program is going smoothly, except for one small problem. I have a
>> "global" variable that is set by the user (and thus in the main function),
>> that I want one of my reduce functions to access. This is a read-only
>> variable. After some reading in the forums, I tried something like this:
>>
>> file: MyGlobalVars.java
>> package org.apache.hadoop.examples;
>> public class MyGlobalVars {
>>    static public int nTax;
>> }
>> ------
>>
>> file: myLittleMRProgram.java
>> package.org.apache.hadoop.examples;
>> map function() {
>>   System.out.println("in map function, nTax is: " + MyGlobalVars.nTax);
>> }
>> ....
>> main() {
>> MyGlobalVars.nTax = other_args.get(2);
>> System.out.println("in main function, nTax is: " + MyGlobalVars.nTax);
>> ....
>> JobClient.runJob(conf);
>> ....
>> return 0;
>> }
>> --------
>>
>> When I run it, I get:
>> in main function, nTax is 20 (which is what I want)
>> in map function, nTax is 0 (<--- this is not right).
>>
>>
>> I am a little confused on how to resolve this. I apologize in advance if
>> this is an blatant java error; I only began programming in the language a
>> few weeks ago.
>>
>> Since Map Reduce tries to avoid the whole shared-memory scene, I am more
>> than willing to have each reduce function receive a local copy of this
>> user
>> defined value. However, I am a little confused on what the best way to do
>> this would be. As I see it, my options are:
>>
>> 1.) write the user defined value to the hdfs in the main function, and
>> have
>> it read from the hdfs in the reduce function. I can't quite figure out the
>> code to this though. I know how to specify -an- input file for the map
>> reduce task, but if I did it this way, won't I need to specify two
>> separate
>> input files?
>>
>> 2. Put it in the construction of the reduce object (I saw this mentioned
>> in
>> the archives). How would I accomplish this exactly when the value is user
>> defined? Parameter Passing? If so, won't this require me changing the
>> underlying map reduce base (which makes me a touch nervous, since i'm
>> still
>> very new to hadoop).
>>
>> What would be the easiest way to do this?
>>
>> Thanks in advance for the help. I appreciate your time.
>>
>> -SM
>>
>>
>>
> --
> Jason Venner
> Attributor - Program the Web <http://www.attributor.com/>
> Attributor is hiring Hadoop Wranglers and coding wizards, contact if
> interested
>

Reply via email to