IP stuff: I will send out a link to the pdf that describes KMotif, and the cross correlation comes from http://mathworld.wolfram.com/CorrelationCoefficient.html with an implementation that correlates column-wise. Both euclidean and city-block distance measures come from basic data mining textbooks (my textbook is Data Mining by Mehmed Kantardzic) or http://www.statsoft.com/textbook/stcluan.html. Please let me know if this is sufficient, or if I need more references.
Distance measures, are basically a numeric way of classifying a relationship between two numerical or categorical datasets. Usually distance measures are used in conjunction with k-means, or hierarchical clustering (or some type of clustering algorithm). I think the architecture question applies to K-means and difference/similarity algorithms. I am not sure of the best architecture for these algorithms. Should each distance/similarity measure be its own class, allowing these to be passed into an engine that is the clustering algorithm? For instance have a k-means class who has a private variable of type ClusertingMeasurementAlgorithm, where: EuclideanDistance which implements, DistanceMeasure which implements, ClusteringMeasurementAlgorithm Does this sound somewhat logical? If we had an engine that took an instance of ClusteringMeasurementAlgorithm as a constructor parameter, it could handle all operations on the data using the specific measurement algorithm. The reason I am trying to abstract the clustering algorithm more than a difference measure is due to the fact that clustering may be done on similiarity and difference measures. Please tell me if this sounds outrageous, because I do not have alot of architecture experience. Thanks, John --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
