I've got to play with this now. I'm getting divide by zero errors so I might
have to shift the range up by five. We'll see.
; file: num-sim.prm - Copyright (c) 2014 Michael S P Miller
module test-it
function mean {?sample} (local ?sum ?len) (put 0 ?sum (length ?sample) ?len)
(for ?n in ?sample (put (+ ?n ?sum) ?sum)) (if (= ?len 0) (fail 'no data
points' calculation) else (/ ?sum ?len))end
function sigma {?sample ?mean} (sqrt (mean (collect ?n in ?sample (exp (- ?n
?mean) 2))))end
function z-score {?point ?mean ?sigma} (/ (- ?point ?mean) ?sigma)end
function similarity {?a ?b} (local ?s ?m ?za ?zb ?zm) (put (mean {?a ?b})
?m (sigma {?a ?b} ?m) ?s (z-score ?a ?m ?s) ?za (z-score ?b
?m ?s) ?zb (mean {?za ?zb}) ?zm)
(* 0.5 (- (+ (/ ?zm ?za) (/ ?zm ?zb)) (+ (/ (- ?za ?zb) ?za) (/ (-
?zb za) ?zb))))end
end ; module test-it
Your comments are appreciated.
~PM
From: [email protected]
To: [email protected]
Subject: RE: [agi] Numeric Similarity
Date: Thu, 20 Feb 2014 12:02:57 -0800
I think we're all making this a lot harder than it should be.
I don't accept the answer "it can't be done." Computers can do a lot, even
this.
Yes, my similarity measure is arbitrary, -1 to +1. That's correct.
Sigma gives a number between -4 and +4. If I divide the sigma by 4, I get the
rangeI'm looking for. -1 to +1. That's a step closer.
The z-score tells me where a number lies in the normal distribution. So if I
divide the z-score by 4 I would get the number in the range -1 to +1. That's
another step closer.
I can also say that the mean represents the intersection of two numbers.
What would represent the "set difference" of two numbers? Well the delta, (a -
b) or (b - a), could yield a set difference. It's just a matter of scale. I
think if we work with the zero scores of each number then we'd be much closer.
Let's see...
My general similarity algorithm for two items a and b is
(* 0.5 (- intersections - differences))
where intersections := (+ (/ intersection a) (/ intersection b)) where
differences := (+ (/ differenceAtoB a) (/ differenceBtoA b))
for lists, intersection is easy, for numbers we can assume it is the mean.
for lists, differenceAtoB is simply set difference, likewise differenceBtoA;
for numbers we can assume this is simple delta, a - b, or b - a.
Does this get us closer? Possibly, I think we have to use the z-scores though.
intersection is the mean of the z-scores and difference is one z-score minus
another.
Let me try this out . . . I'll be back.
~PM
Date: Thu, 20 Feb 2014 14:35:40 -0500
Subject: Re: [agi] Numeric Similarity
From: [email protected]
To: [email protected]
As, I tried to explain, PM, there's no final measure of similarity, even
between two integers. Integers can be interrelated | compared via potentially
infinite number of inverse arithmetic operations, each of which gives you match
(similarity) & miss (difference).
I can only give you a starting point, & a way to proceed from there: The
simplest comparison is be subtraction, which gives you the simplest absolute
match: smaller comparand... Your request to quantify similarity between 1 & -1
is arbitrary, - resolution of match is subset of resolution of comparands,
which you didn't define.
But what really matters is a relative match: absolute match compared to a
higher-level average match. That average is feedback down the hierarchy of
search (I know you're having difficulty with that concept). And for lists it's
even more complex, - they *consist* of numbers.
So, the problem of quantifying similarity is ultimately *the* problem of GI, &
you shouldn't expect a simple answer to that.
On Thu, Feb 20, 2014 at 11:57 AM, Piaget Modeler <[email protected]>
wrote:
Thanks for your response Boris.
My aim at the moment is to define a function for any two numbers a b.
Similarity(a, b) ::= c | c in [-1 .. +1].
Examples:
Similarity(0, 0) = 1.0
Similarity(239420, 239420) = 1.0
Similarity(3.1415926, 3.14) = 0.9995 /* or something close to but less
than one */
Similarity(-7123456789098765, -7123456789098765) = 1.0
And so forth.
From it I gather, your suggestion, not algorithm, is
"initial comparison between integers is by subtraction, which compresses miss
from !AND to difference by cancelling opposite-sign bits, & increases match
because it’s a complimentary of that reduced difference.
Division will further reduce magnitude of miss by converting it from difference
to ratio, which can then be reduced again by converting it to logarithm, & so
on. By reducing miss, higher power of comparison will also increase
complimentary match. But the costs may grow even faster, for both operations &
incremental syntax to record incidental sign, fraction, & irrational fraction.
The power of comparison is increased if current-power match plus miss predict
an improvement, as indicated by higher-order comparison between results from
different powers of comparison. Such “meta-comparison” can discover algorithms,
or meta-patterns."
Similarity(number a, number b) ::= log( (a-b) / ????)
This seems a bit confusing for me.
Your thoughts?
~PM.
Date: Thu, 20 Feb 2014 09:23:47 -0500
Subject: Re: [agi] Numeric Similarity
From: [email protected]
To: [email protected]
You finally got to a right starting point. This is covered in part 2 of my
intro: http://www.cognitivealgorithm.info/
2. Comparison: quantifying match & miss per input.
The purpose of cognition is to predict, & prediction must be quantified.
Algorithmic information theory defines predictability as compressibility of
representations, which is perfectly fine. However, current implementations of
AIT quantify compression only for whole sequences of inputs.
To enable far more incremental selection (& correspondingly scalable search), I
start by quantifying match between individual inputs. Partial match is a new
dimension of analysis, additive to binary same | different distinction of
probabilistic inference. This is analogous to the way probabilistic inference
improved on classical logic by quantifying partial probability of statements,
vs binary true | false values.
Individual partial match is compression of magnitude, by replacing larger
comparand with its difference relative to smaller comparand. In other words,
match is a complementary of miss, initially equal to the smaller comparand.
Ultimate criterion is recorded magnitude, rather than record space: bits of
memory it occupies after compression, because the former represents physical
impact that we want to predict.
This definition is tautological: smaller comparand = sum of Boolean AND between
uncompressed (unary code) representations of both comparands, = partial
identity of these comparands. Some may object that identity also includes the
case when both comparands or bits thereof equal zero, but that identity also
equals zero. Again, the purpose here is prediction, which is a representational
equivalent of conservation in physics. We’re predicting some potential impact
on the observer, represented by an input. Zero input ultimately means zero
impact, which has no conservable physical value (inertia), thus no intrinsic
predictive value.
Given incremental complexity of representation, initial inputs should have
binary resolution. However, average binary match won’t justify the cost of
comparison: syntactic overhead of representing new match & miss between
positionally distinct inputs. Rather, these binary inputs are compressed by
digitization within a position (coordinate): substitution of every two
lower-order bits with one higher-order bit within an integer. Resolution of
that coordinate (input aggregation span) is adjusted to form integers
sufficiently large to produce (when compared) average match that exceeds
above-mentioned costs of comparison. These are “opportunity costs“: a
longer-range average match discovered by equivalent computational resources.
So, the next order of compression is comparison across coordinates, initially
defined with binary resolution as before | after input. Any comparison is an
inverse arithmetic operation of incremental power: Boolean AND, subtraction,
division, logarithm, & so on. Actually, since digitization already compressed
inputs by AND, comparison of that power won’t further compress resulting
integers. In general, match is *additive* compression, achieved only by
comparison of a higher power than that which produced the comparands. Thus,
initial comparison between integers is by subtraction, which compresses miss
from !AND to difference by cancelling opposite-sign bits, & increases match
because it’s a complimentary of that reduced difference.
Division will further reduce magnitude of miss by converting it from difference
to ratio, which can then be reduced again by converting it to logarithm, & so
on. By reducing miss, higher power of comparison will also increase
complimentary match. But the costs may grow even faster, for both operations &
incremental syntax to record incidental sign, fraction, & irrational fraction.
The power of comparison is increased if current-power match plus miss predict
an improvement, as indicated by higher-order comparison between results from
different powers of comparison. Such “meta-comparison” can discover algorithms,
or meta-patterns.
On Thu, Feb 20, 2014 at 12:01 AM, Piaget Modeler <[email protected]>
wrote:
Hi all,
For all you statisticians out there...
I'm working on an algorithm for numeric similarity and would like to
crowdsource the solution.
Given two numbers, i.e., two observations, how can I get a score between -1 and
1 indicating their proximity.
I think I need to compute a few things,
1. Compute the mean of the observations.2. Compute the standard deviation sigma
of the observations.3. Compute the z-score of each number.
Once I know the z-score for each number I knew where each number lies along the
normal distribution.
After that I'm a little lost.
Is there a notion of difference or sameness after that.
This might help..
http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html
Your thoughts are appreciated ?
Michael Miller.
AGI | Archives
| Modify
Your Subscription
AGI | Archives
| Modify
Your Subscription
AGI | Archives
| Modify
Your Subscription
AGI | Archives
| Modify
Your Subscription
AGI | Archives
| Modify
Your Subscription
-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com