Sure Ted. I will do that.

Thanks
Pallavi
Ted Dunning wrote:
Pallavi,

This is very useful feedback.

What you have done is very similar to the k-means++ algorithm and it is
clearly a very good thing.

There is already an issue for tracking a k-means++ implementation:
http://issues.apache.org/jira/browse/MAHOUT-153

Could you post your patch there?

On Mon, Jan 4, 2010 at 4:03 AM, Palleti, Pallavi <
[email protected]> wrote:

Initially, I used canopy clustering seeds as initial seeds but the results
weren't good and the number of clusters depends on the distance thresholds
we give as input. Later, I have considered randomly selecting some points
from the input dataset and consider them as initial seeds. Again, the
results were not good. Now, I have chosen initial seeds from input set in
such a way that the points are far from each other and I have observed
better clustering using Fuzzy Kmeans. I have not implemented a map-reducable
version for this seed selection. I will soon implement a map-reducable
version and submit a patch.





Reply via email to