Abram,

On 7/6/08, Abram Demski <[EMAIL PROTECTED]> wrote:
>
> The SPI paper does make that constraint, but it also allows for
> multiple clusterings; so within one clustering clusters are mutually
> exclusive, but this does not really restrict things. Perhaps it would
> be simpler to get rid of the constraint, making the multiple
> clusterings unnecessary. In fact it would be "simpler" just to get rid
> of all constraints on the hidden entities, and perform a general
> search for the best hidden structure to explain the data... but, like
> the constraint you mentioned, some things that seem like restrictions
> are not really. For example, we could restrict the hidden entities to
> form some particular Turing-complete language; examining the
> constraints, it would at first look like a very harsh restriction, but
> of course once one realized that it could express any computable
> pattern it would be no more harsh than the original restriction to
> 1st-order logic.


I see your point. It just seems simpler to work with as few constraints as
possible, because constraints usually make SUCH a mess of the math. Who says
that our population can't be negative?! How many people do I owe?

Anyway, I do not have a clear picture of your dimensionality concern.
> There are ways of clustering in domains where euclidean distance is
> not relevant (particularly binary domains), but I do not understand
> what you mean when you say that dimensions have unknown sizes.


In our 3-D world, a Meter in one dimension makes the same separation as a
Meter in another dimension. (given the same geometry between points).
However, when different dimensions are in different and incomparable units
(e.g. sensory fusion, or time vs. distance), the difference between True and
False in one input (dimension) may make a much greater or lesser difference
than the difference between True and False in another input (dimension).
This difference becomes huge/astronomical/infinite when one of the inputs is
completely/uselessly/relatively random.

So what? Clusters should likely be separated by difference in one
(larger-dimensioned, higher efficacy, more significant, etc.) input, but not
separated by differences in another (smaller-dimensioned, lower-efficacy,
less significant, maybe completely insignificant, etc.) input. As I (perhaps
wrongly) read the article, its version of clustering suffers from
brain-short when presented with a number of random inputs mixed in with some
important inputs, and its reported clustering will more reflect the
randomness of its inputs than the real information in the important inputs.

Of course, a "feature" of high-bandwidth information flow is that it appears
to be random. Hence, attempts to identify and discard random input may well
throw the baby out with the bathwater. Does this make meaningful clustering
impossible? I don't think so, but it may become a multiple-pass proposition,
where "good" inputs are identified one-by-one using past identified good
inputs as a guide. Perhaps neurons, with their very serial sort of evolving
functionality, are really doing EXACTLY what is needed?!

In short, doesn't the article make the unidentified assumption that all
inputs are of comparable importance, and that useful clustering exists. When
either of these criteria are not met, the process nonetheless produces
results, but they are worthless.

Hopefully I am missing something here. Can you see it?
Steve Richfield



-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=106510220-47b225
Powered by Listbox: http://www.listbox.com

Reply via email to