Hello,
My M/R program is going smoothly, except for one small problem. I have a
"global" variable that is set by the user (and thus in the main function),
that I want one of my reduce functions to access. This is a read-only
variable. After some reading in the forums, I tried something like this:
file: MyGlobalVars.java
package org.apache.hadoop.examples;
public class MyGlobalVars {
static public int nTax;
}
------
file: myLittleMRProgram.java
package.org.apache.hadoop.examples;
map function() {
System.out.println("in map function, nTax is: " + MyGlobalVars.nTax);
}
....
main() {
MyGlobalVars.nTax = other_args.get(2);
System.out.println("in main function, nTax is: " + MyGlobalVars.nTax);
....
JobClient.runJob(conf);
....
return 0;
}
--------
When I run it, I get:
in main function, nTax is 20 (which is what I want)
in map function, nTax is 0 (<--- this is not right).
I am a little confused on how to resolve this. I apologize in advance if
this is an blatant java error; I only began programming in the language a
few weeks ago.
Since Map Reduce tries to avoid the whole shared-memory scene, I am more
than willing to have each reduce function receive a local copy of this user
defined value. However, I am a little confused on what the best way to do
this would be. As I see it, my options are:
1.) write the user defined value to the hdfs in the main function, and have
it read from the hdfs in the reduce function. I can't quite figure out the
code to this though. I know how to specify -an- input file for the map
reduce task, but if I did it this way, won't I need to specify two separate
input files?
2. Put it in the construction of the reduce object (I saw this mentioned in
the archives). How would I accomplish this exactly when the value is user
defined? Parameter Passing? If so, won't this require me changing the
underlying map reduce base (which makes me a touch nervous, since i'm still
very new to hadoop).
What would be the easiest way to do this?
Thanks in advance for the help. I appreciate your time.
-SM