Abram (and Vladimir) Thanks for the article hyperlink. I am also interested in clustering for my own efforts. Mere is my problem, and I suspect your problems also:
Much real-world clustering seems to be in n-dimensional space where n is the number of variables among which you are looking for clusters. With the variables in different dimensions that have unknown size, computing distance between "points" by the usual square root of the squares of the differences doesn't make much sense at all. Instead, it seems that a "cluster" is simply a set of nearly-identical "binary" (with probabilities) variables. Some of the SPI paper seems rather arbitrary and unexplained, e.g. that a point can only occur in one cluster (3rd page, 2nd column, ~2/3 of the way down). This seems to presume that the method is only finding the LAST REMAINING relationship, and ignores the possibility of noisy data... The SPI paper is somewhat opaque with terminology and heavy with external references, so it is rather hard to determine exactly how the issue of high-dimensionality space with unknown sizes of its dimensions would fit into its discussion. Perhaps I have simply missed the point and/or am looking at this thing all wrong. Can anyone here help? Steve Richfield ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225 Powered by Listbox: http://www.listbox.com
