Sorry for the response latency... See comments interspersed. On 8/15/05, John Gant <[EMAIL PROTECTED]> wrote: > IP stuff: > I will send out a link to the pdf that describes KMotif,
Thanks. and the cross > correlation comes from > http://mathworld.wolfram.com/CorrelationCoefficient.html with an > implementation that correlates column-wise. What exactly does "column-wise" mean. This just looks like Pearson's R, which is already available in the SimpleRegression class. Do you mean generation of correlation matrices? Saw you just posted some code. Thanks! I will have a look :-) > Both euclidean and > city-block distance measures come from basic data mining textbooks (my > textbook is Data Mining by Mehmed Kantardzic) or > http://www.statsoft.com/textbook/stcluan.html. Please let me know if > this is sufficient, or if I need more references. This is sufficient. Online references are good if you can find stable links (like above). > > Distance measures, are basically a numeric way of classifying a > relationship between two numerical or categorical datasets. Usually > distance measures are used in conjunction with k-means, or > hierarchical clustering (or some type of clustering algorithm). Are these essentially metrics on R^n (the "numerical" case) or homogeneity measures (e.g. chi-square, for the categorical case)? > > I think the architecture question applies to K-means and > difference/similarity algorithms. I am not sure of the best > architecture for these algorithms. Should each distance/similarity > measure be its own class, allowing these to be passed into an engine > that is the clustering algorithm? If the algorithms can make use of "pluggable' distance measures, then yes, this would make sense. > For instance have a k-means class > who has a private variable of type ClusertingMeasurementAlgorithm, > where: > > EuclideanDistance which implements, > DistanceMeasure which implements, > ClusteringMeasurementAlgorithm > > Does this sound somewhat logical? If we had an engine that took an > instance of ClusteringMeasurementAlgorithm as a constructor parameter, > it could handle all operations on the data using the specific > measurement algorithm. I am confused about what is being abstracted here. If it is the distance measure, the interface should be called something that ends in "Measure" or "Metric" > The reason I am trying to abstract the > clustering algorithm more than a difference measure is due to the fact > that clustering may be done on similiarity and difference measures. > Please tell me if this sounds outrageous, because I do not have alot > of architecture experience. If a clustering algorithm can use mutlitple different distance measures, then it does make sense to encapsulate the distance measure. Defining a distance measure or metric interface and then defining implementation classes that implement that interface and having the clustering algorithms have instances of these as members is a reasonable way to do this, IMHO. Phil --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
