Abram, On 7/6/08, Abram Demski <[EMAIL PROTECTED]> wrote: > > The SPI paper does make that constraint, but it also allows for > multiple clusterings; so within one clustering clusters are mutually > exclusive, but this does not really restrict things. Perhaps it would > be simpler to get rid of the constraint, making the multiple > clusterings unnecessary. In fact it would be "simpler" just to get rid > of all constraints on the hidden entities, and perform a general > search for the best hidden structure to explain the data... but, like > the constraint you mentioned, some things that seem like restrictions > are not really. For example, we could restrict the hidden entities to > form some particular Turing-complete language; examining the > constraints, it would at first look like a very harsh restriction, but > of course once one realized that it could express any computable > pattern it would be no more harsh than the original restriction to > 1st-order logic.
I see your point. It just seems simpler to work with as few constraints as possible, because constraints usually make SUCH a mess of the math. Who says that our population can't be negative?! How many people do I owe? Anyway, I do not have a clear picture of your dimensionality concern. > There are ways of clustering in domains where euclidean distance is > not relevant (particularly binary domains), but I do not understand > what you mean when you say that dimensions have unknown sizes. In our 3-D world, a Meter in one dimension makes the same separation as a Meter in another dimension. (given the same geometry between points). However, when different dimensions are in different and incomparable units (e.g. sensory fusion, or time vs. distance), the difference between True and False in one input (dimension) may make a much greater or lesser difference than the difference between True and False in another input (dimension). This difference becomes huge/astronomical/infinite when one of the inputs is completely/uselessly/relatively random. So what? Clusters should likely be separated by difference in one (larger-dimensioned, higher efficacy, more significant, etc.) input, but not separated by differences in another (smaller-dimensioned, lower-efficacy, less significant, maybe completely insignificant, etc.) input. As I (perhaps wrongly) read the article, its version of clustering suffers from brain-short when presented with a number of random inputs mixed in with some important inputs, and its reported clustering will more reflect the randomness of its inputs than the real information in the important inputs. Of course, a "feature" of high-bandwidth information flow is that it appears to be random. Hence, attempts to identify and discard random input may well throw the baby out with the bathwater. Does this make meaningful clustering impossible? I don't think so, but it may become a multiple-pass proposition, where "good" inputs are identified one-by-one using past identified good inputs as a guide. Perhaps neurons, with their very serial sort of evolving functionality, are really doing EXACTLY what is needed?! In short, doesn't the article make the unidentified assumption that all inputs are of comparable importance, and that useful clustering exists. When either of these criteria are not met, the process nonetheless produces results, but they are worthless. Hopefully I am missing something here. Can you see it? Steve Richfield ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225 Powered by Listbox: http://www.listbox.com
