Sorry for the response latency... See comments interspersed.

On 8/15/05, John Gant <[EMAIL PROTECTED]> wrote:
> IP stuff:
> I will send out a link to the pdf that describes KMotif, 

Thanks.

and the cross
> correlation comes from
> http://mathworld.wolfram.com/CorrelationCoefficient.html with an
> implementation that correlates column-wise.

What exactly does "column-wise" mean.  This just looks like Pearson's
R, which is already available in the SimpleRegression class.  Do you
mean generation of correlation matrices?

Saw you just posted some code.  Thanks!  I will have a look :-)

> Both euclidean and
> city-block distance measures come from basic data mining textbooks (my
> textbook is Data Mining by Mehmed Kantardzic) or
> http://www.statsoft.com/textbook/stcluan.html. Please let me know if
> this is sufficient, or if I need more references.

This is sufficient.  Online references are good if you can find stable
links (like above).
> 
> Distance measures, are basically a numeric way of classifying a
> relationship between two numerical or categorical datasets. Usually
> distance measures are used in conjunction with k-means, or
> hierarchical clustering (or some type of clustering algorithm).

Are these essentially metrics on R^n (the "numerical" case) or
homogeneity measures (e.g. chi-square, for the categorical case)?

> 
> I think the architecture question applies to K-means and
> difference/similarity algorithms. I am not sure of the best
> architecture for these algorithms. Should each distance/similarity
> measure be its own class, allowing these to be passed into an engine
> that is the clustering algorithm?

If the algorithms can make use of "pluggable' distance measures, then
yes, this would make sense.

> For instance have a k-means class
> who has a private variable of type ClusertingMeasurementAlgorithm,
> where:
> 
> EuclideanDistance which implements,
> DistanceMeasure which implements,
> ClusteringMeasurementAlgorithm
> 
> Does this sound somewhat logical? If we had an engine that took an
> instance of ClusteringMeasurementAlgorithm as a constructor parameter,
> it could handle all operations on the data using the specific
> measurement algorithm.

I am confused about what is being abstracted here.  If it is the
distance measure, the interface should be called something that ends
in "Measure" or "Metric"

> The reason I am trying to abstract the
> clustering algorithm more than a difference measure is due to the fact
> that clustering may be done on similiarity and difference measures.
> Please tell me if this sounds outrageous, because I do not have alot
> of architecture experience.

If a clustering algorithm can use mutlitple different distance
measures, then it does make sense to encapsulate the distance measure.
 Defining a distance measure or metric interface and then defining
implementation classes that implement that interface and having the
clustering algorithms have instances of these as members is a
reasonable way to do this, IMHO.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to