[chaining] manipulate job conf in reducer

Erik Test Mon, 23 Aug 2010 13:10:52 -0700

Hello,

I'm fairly new to hadoop and i'm writing a kmeans clustering algorithm using
only hadoop. what i would like to do is determine the new centroids in the
reducer has pass the new centroid values back to main and then run another
map/reduce job.


here's the code i have for the reducer.

public void reduce(Text key, Iterator<Text> values,
            OutputCollector<Text, Text> output, Reporter reporter)
              throws IOException
                {
                  String[] myStringValue;
                  String temp = "";
                  double xTotal = 0.0;
                  double yTotal = 0.0;

                  ArrayList<String> myValues = new ArrayList<String>();
                  while(values.hasNext())
                  {
                    temp = values.next().toString();
                    myValues.add(temp);
                    myStringValue = temp.split("[,]");
                    xTotal += Double.parseDouble(myStringValue[1]);
                    yTotal += Double.parseDouble(myStringValue[2]);
                  }

                  double newCentroidX = xTotal / myValues.size();
                  double newCentroidY = yTotal / myValues.size();

                  String newCentroid = newCentroidX + "," + newCentroidY;
                  String[] Klabel = key.toString().split("[.]");

                  if(myConf.get("NewCentroid") != null)
                  {
                    myConf.set("NewCentroid", myConf.get("NewCentroid") +
":" + newCentroid);
                  }
                  else
                  {
                    myConf.set("NewCentroid", newCentroid);
                  }
                }

I'm not collecting anything for the output file yet. I plan on collecting
the final centroid values in the reducer.

Is it possible to pass configuration values from the reducer to the
driver/main program?

Thanks
Erik

[chaining] manipulate job conf in reducer

Reply via email to