You finally got to a right starting point. This is covered in part 2 of my
intro: http://www.cognitivealgorithm.info/

 *2. Comparison: quantifying match & miss per input.*

The purpose of cognition is to predict, & prediction must be quantified.
Algorithmic information theory defines predictability as compressibility of
representations, which is perfectly fine. However, current implementations
of AIT quantify compression only for whole sequences of inputs.
To enable far more incremental selection (& correspondingly scalable
search), I start by quantifying match between individual inputs. Partial
match is a new dimension of analysis, additive to binary same | different
distinction of probabilistic inference. This is analogous to the way
probabilistic inference improved on classical logic by quantifying partial
probability of statements, vs binary true | false values.

Individual partial match is compression of magnitude, by replacing larger
comparand with its difference relative to smaller comparand. In other
words, match is a complementary of miss, initially equal to the smaller
comparand. Ultimate criterion is recorded magnitude, rather than record
space: bits of memory it occupies after compression, because the former
represents physical impact that we want to predict.

This definition is tautological: smaller comparand = sum of Boolean AND
between uncompressed (unary code) representations of both comparands, =
partial identity of these comparands. Some may object that identity also
includes the case when both comparands or bits thereof equal zero, but that
identity also equals zero. Again, the purpose here is prediction, which is
a representational equivalent of conservation in physics. We're predicting
some potential impact on the observer, represented by an input. Zero input
ultimately means zero impact, which has no conservable physical value
(inertia), thus no intrinsic predictive value.

Given incremental complexity of representation, initial inputs should have
binary resolution. However, average binary match won't justify the cost of
comparison: syntactic overhead of representing new match & miss between
positionally distinct inputs. Rather, these binary inputs are compressed by
digitization within a position (coordinate): substitution of every two
lower-order bits with one higher-order bit within an integer. Resolution of
that coordinate (input aggregation span) is adjusted to form integers
sufficiently large to produce (when compared) average match that exceeds
above-mentioned costs of comparison. These are "opportunity costs": a
longer-range average match discovered by equivalent computational resources.

So, the next order of compression is comparison across coordinates,
initially defined with binary resolution as before | after input. Any
comparison is an inverse arithmetic operation of incremental power: Boolean
AND, subtraction, division, logarithm, & so on. Actually, since
digitization already compressed inputs by AND, comparison of that power
won't further compress resulting integers. In general, match is *additive*
compression, achieved only by comparison of a higher power than that which
produced the comparands. Thus, initial comparison between integers is by
subtraction, which compresses miss from !AND to difference by cancelling
opposite-sign bits, & increases match because it's a complimentary of that
reduced difference.

Division will further reduce magnitude of miss by converting it from
difference to ratio, which can then be reduced again by converting it to
logarithm, & so on. By reducing miss, higher power of comparison will also
increase complimentary match. But the costs may grow even faster, for both
operations & incremental syntax to record incidental sign, fraction, &
irrational fraction. The power of comparison is increased if current-power
match plus miss predict an improvement, as indicated by higher-order
comparison between results from different powers of comparison. Such
"meta-comparison" can discover algorithms, or meta-patterns.



On Thu, Feb 20, 2014 at 12:01 AM, Piaget Modeler
<[email protected]>wrote:

> Hi all,
>
> For all you statisticians out there...
>
> I'm working on an algorithm for numeric similarity and would like to
> crowdsource the solution.
>
> Given two numbers, i.e., two observations, how can I get a score between
> -1 and 1 indicating their proximity.
>
> I think I need to compute a few things,
>
> 1. Compute the *mean* of the observations.
> 2. Compute the standard deviation *sigma* of the observations.
> 3. Compute the *z-score* of each number.
>
> Once I know the z-score for each number I knew where each number lies
> along the normal distribution.
>
> After that I'm a little lost.
>
> Is there a notion of difference or sameness after that.
>
> This might help..
>
>
> http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html
>
> Your thoughts are appreciated ?
>
> Michael Miller.
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/18407320-d9907b69> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to