Well, all of your examples have a positive value.
Can you describe examples of where the result is 0, and where the result
is negative?
Thanks,
Dimitry
On 2/20/2014 10:57 AM, Piaget Modeler wrote:
Thanks for your response Boris.
My aim at the moment is to define a function for any two numbers a b.
Similarity(a, b) ::= c | c in [-1 .. +1].
Examples:
Similarity(0, 0) = 1.0
Similarity(239420, 239420) = 1.0
Similarity(3.1415926, 3.14) = 0.9995 /* or something close to but
less than one */
Similarity(-7123456789098765, -7123456789098765) = 1.0
And so forth.
From it I gather, your suggestion, not algorithm, is
/"initial comparison between integers is by subtraction, which
compresses miss from !AND to difference by cancelling opposite-sign
bits, & increases match because it’s a complimentary of that reduced
difference./
/
Division will further reduce magnitude of miss by converting it from
difference to ratio, which can then be reduced again by converting it
to logarithm, & so on. By reducing miss, higher power of comparison
will also increase complimentary match. But the costs may grow even
faster, for both operations & incremental syntax to record incidental
sign, fraction, & irrational fraction. The power of comparison is
increased if current-power match plus miss predict an improvement, as
indicated by higher-order comparison between results from different
powers of comparison. Such “meta-comparison” can discover algorithms,
or meta-patterns."/
Similarity(number a, number b) ::= log( (a-b) / ????)
This seems a bit confusing for me.
Your thoughts?
~PM.
------------------------------------------------------------------------
Date: Thu, 20 Feb 2014 09:23:47 -0500
Subject: Re: [agi] Numeric Similarity
From: [email protected]
To: [email protected]
You finally got to a right starting point. This is covered in part 2
of my intro: http://www.cognitivealgorithm.info/
/2. Comparison: quantifying match & miss per input./
The purpose of cognition is to predict, & prediction must be
quantified. Algorithmic information theory defines predictability as
compressibility of representations, which is perfectly fine. However,
current implementations of AIT quantify compression only for whole
sequences of inputs.
To enable far more incremental selection (& correspondingly scalable
search), I start by quantifying match between individual inputs.
Partial match is a new dimension of analysis, additive to binary same
| different distinction of probabilistic inference. This is analogous
to the way probabilistic inference improved on classical logic by
quantifying partial probability of statements, vs binary true | false
values.
Individual partial match is compression of magnitude, by replacing
larger comparand with its difference relative to smaller comparand. In
other words, match is a complementary of miss, initially equal to the
smaller comparand. Ultimate criterion is recorded magnitude, rather
than record space: bits of memory it occupies after compression,
because the former represents physical impact that we want to predict.
This definition is tautological: smaller comparand = sum of Boolean
AND between uncompressed (unary code) representations of both
comparands, = partial identity of these comparands. Some may object
that identity also includes the case when both comparands or bits
thereof equal zero, but that identity also equals zero. Again, the
purpose here is prediction, which is a representational equivalent of
conservation in physics. We’re predicting some potential impact on the
observer, represented by an input. Zero input ultimately means zero
impact, which has no conservable physical value (inertia), thus no
intrinsic predictive value.
Given incremental complexity of representation, initial inputs should
have binary resolution. However, average binary match won’t justify
the cost of comparison: syntactic overhead of representing new match &
miss between positionally distinct inputs. Rather, these binary inputs
are compressed by digitization within a position (coordinate):
substitution of every two lower-order bits with one higher-order bit
within an integer. Resolution of that coordinate (input aggregation
span) is adjusted to form integers sufficiently large to produce (when
compared) average match that exceeds above-mentioned costs of
comparison. These are “opportunity costs“: a longer-range average
match discovered by equivalent computational resources.
So, the next order of compression is comparison across coordinates,
initially defined with binary resolution as before | after input. Any
comparison is an inverse arithmetic operation of incremental power:
Boolean AND, subtraction, division, logarithm, & so on. Actually,
since digitization already compressed inputs by AND, comparison of
that power won’t further compress resulting integers. In general,
match is *additive* compression, achieved only by comparison of a
higher power than that which produced the comparands. Thus, initial
comparison between integers is by subtraction, which compresses miss
from !AND to difference by cancelling opposite-sign bits, & increases
match because it’s a complimentary of that reduced difference.
Division will further reduce magnitude of miss by converting it from
difference to ratio, which can then be reduced again by converting it
to logarithm, & so on. By reducing miss, higher power of comparison
will also increase complimentary match. But the costs may grow even
faster, for both operations & incremental syntax to record incidental
sign, fraction, & irrational fraction. The power of comparison is
increased if current-power match plus miss predict an improvement, as
indicated by higher-order comparison between results from different
powers of comparison. Such “meta-comparison” can discover algorithms,
or meta-patterns.
On Thu, Feb 20, 2014 at 12:01 AM, Piaget Modeler
<[email protected] <mailto:[email protected]>> wrote:
Hi all,
For all you statisticians out there...
I'm working on an algorithm for numeric similarity and would like
to crowdsource the solution.
Given two numbers, i.e., two observations, how can I get a score
between -1 and 1 indicating their proximity.
I think I need to compute a few things,
1. Compute the *mean* of the observations.
2. Compute the standard deviation *sigma* of the observations.
3. Compute the *z-score* of each number.
Once I know the z-score for each number I knew where each number
lies along the normal distribution.
After that I'm a little lost.
Is there a notion of difference or sameness after that.
This might help..
http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html
Your thoughts are appreciated ?
Michael Miller.
*AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
<https://www.listbox.com/member/archive/rss/303/18407320-d9907b69>
| Modify <https://www.listbox.com/member/?&> Your Subscription
[Powered by Listbox] <http://www.listbox.com>
*AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
<https://www.listbox.com/member/archive/rss/303/19999924-4a978ccc> |
Modify <https://www.listbox.com/member/?&> Your Subscription [Powered
by Listbox] <http://www.listbox.com>
*AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
<https://www.listbox.com/member/archive/rss/303/10215994-5ed4e9d1> |
Modify
<https://www.listbox.com/member/?&>
Your Subscription [Powered by Listbox] <http://www.listbox.com>
____________________________________________________________
____________________________________________________________
The #1 Worst Carb Ever?
Click to Learn #1 Carb that Kills Your Blood Sugar (Don't Eat This!)
http://thirdpartyoffers.juno.com/TGL3141/53063f66981133f641608st04vuc
-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription:
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com