Hello,
I'm fairly new to hadoop and i'm writing a kmeans clustering algorithm using
only hadoop. what i would like to do is determine the new centroids in the
reducer has pass the new centroid values back to main and then run another
map/reduce job.
here's the code i have for the reducer.
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output, Reporter reporter)
throws IOException
{
String[] myStringValue;
String temp = "";
double xTotal = 0.0;
double yTotal = 0.0;
ArrayList<String> myValues = new ArrayList<String>();
while(values.hasNext())
{
temp = values.next().toString();
myValues.add(temp);
myStringValue = temp.split("[,]");
xTotal += Double.parseDouble(myStringValue[1]);
yTotal += Double.parseDouble(myStringValue[2]);
}
double newCentroidX = xTotal / myValues.size();
double newCentroidY = yTotal / myValues.size();
String newCentroid = newCentroidX + "," + newCentroidY;
String[] Klabel = key.toString().split("[.]");
if(myConf.get("NewCentroid") != null)
{
myConf.set("NewCentroid", myConf.get("NewCentroid") +
":" + newCentroid);
}
else
{
myConf.set("NewCentroid", newCentroid);
}
}
I'm not collecting anything for the output file yet. I plan on collecting
the final centroid values in the reducer.
Is it possible to pass configuration values from the reducer to the
driver/main program?
Thanks
Erik