Thanks for your response Boris. 
My aim at the moment is to define a function for any two numbers a b.
Similarity(a, b) ::=  c | c in [-1 .. +1].  
Examples:
Similarity(0, 0) = 1.0
Similarity(239420,  239420) = 1.0
Similarity(3.1415926, 3.14) = 0.9995      /* or something close to but less 
than one */ 
Similarity(-7123456789098765, -7123456789098765) = 1.0
And so forth. 

From it I gather, your suggestion, not algorithm, is 
"initial comparison between integers is by subtraction, which compresses miss 
from !AND to difference by cancelling opposite-sign bits, & increases match 
because it’s a complimentary of that reduced difference.Division will further 
reduce magnitude of miss by converting it from difference to ratio, which can 
then be reduced again by converting it to logarithm, & so on. By reducing miss, 
higher power of comparison will also increase complimentary match. But the 
costs may grow even faster, for both operations & incremental syntax to record 
incidental sign, fraction, & irrational fraction. The power of comparison is 
increased if current-power match plus miss predict an improvement, as indicated 
by higher-order comparison between results from different powers of comparison. 
Such “meta-comparison” can discover algorithms, or meta-patterns."
Similarity(number a, number b) ::= log( (a-b) / ????) 
This seems a bit confusing for me.  
Your thoughts? 
~PM. 

Date: Thu, 20 Feb 2014 09:23:47 -0500
Subject: Re: [agi] Numeric Similarity
From: [email protected]
To: [email protected]

You finally got to a right starting point. This is covered in part 2 of my 
intro: http://www.cognitivealgorithm.info/

 2. Comparison: quantifying match & miss per input.

The purpose of cognition is to predict, & prediction must be quantified. 
Algorithmic information theory defines predictability as compressibility of 
representations, which is perfectly fine. However, current implementations of 
AIT quantify compression only for whole sequences of inputs.

To enable far more incremental selection (& correspondingly scalable search), I 
start by quantifying match between individual inputs. Partial match is a new 
dimension of analysis, additive to binary same | different distinction of 
probabilistic inference. This is analogous to the way probabilistic inference 
improved on classical logic by quantifying partial probability of statements, 
vs binary true | false values.


Individual partial match is compression of magnitude, by replacing larger 
comparand with its difference relative to smaller comparand. In other words, 
match is a complementary of miss, initially equal to the smaller comparand. 
Ultimate criterion is recorded magnitude, rather than record space: bits of 
memory it occupies after compression, because the former represents physical 
impact that we want to predict.


This definition is tautological: smaller comparand = sum of Boolean AND between 
uncompressed (unary code) representations of both comparands, = partial 
identity of these comparands. Some may object that identity also includes the 
case when both comparands or bits thereof equal zero, but that identity also 
equals zero. Again, the purpose here is prediction, which is a representational 
equivalent of conservation in physics. We’re predicting some potential impact 
on the observer, represented by an input. Zero input ultimately means zero 
impact, which has no conservable physical value (inertia), thus no intrinsic 
predictive value.


Given incremental complexity of representation, initial inputs should have 
binary resolution. However, average binary match won’t justify the cost of 
comparison: syntactic overhead of representing new match & miss between 
positionally distinct inputs. Rather, these binary inputs are compressed by 
digitization within a position (coordinate): substitution of every two 
lower-order bits with one higher-order bit within an integer. Resolution of 
that coordinate (input aggregation span) is adjusted to form integers 
sufficiently large to produce (when compared) average match that exceeds 
above-mentioned costs of comparison. These are “opportunity costs“: a 
longer-range average match discovered by equivalent computational resources.


So, the next order of compression is comparison across coordinates, initially 
defined with binary resolution as before | after input. Any comparison is an 
inverse arithmetic operation of incremental power: Boolean AND, subtraction, 
division, logarithm, & so on. Actually, since digitization already compressed 
inputs by AND, comparison of that power won’t further compress resulting 
integers. In general, match is *additive* compression, achieved only by 
comparison of a higher power than that which produced the comparands. Thus, 
initial comparison between integers is by subtraction, which compresses miss 
from !AND to difference by cancelling opposite-sign bits, & increases match 
because it’s a complimentary of that reduced difference.


Division will further reduce magnitude of miss by converting it from difference 
to ratio, which can then be reduced again by converting it to logarithm, & so 
on. By reducing miss, higher power of comparison will also increase 
complimentary match. But the costs may grow even faster, for both operations & 
incremental syntax to record incidental sign, fraction, & irrational fraction. 
The power of comparison is increased if current-power match plus miss predict 
an improvement, as indicated by higher-order comparison between results from 
different powers of comparison. Such “meta-comparison” can discover algorithms, 
or meta-patterns.




On Thu, Feb 20, 2014 at 12:01 AM, Piaget Modeler <[email protected]> 
wrote:




Hi all, 
For all you statisticians out there...
I'm working on an algorithm for numeric similarity and would like to 
crowdsource the solution.

Given two numbers, i.e., two observations, how can I get a score between -1 and 
1 indicating their proximity.
I think I need to compute a few things, 

1. Compute the mean of the observations.2. Compute the standard deviation sigma 
of the observations.3. Compute the z-score of each number. 

Once I know the z-score for each number I knew where each number lies along the 
normal distribution.
After that I'm a little lost.  
Is there a notion of difference or sameness after that. 

This might help..
http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html

Your thoughts are appreciated ? 
Michael Miller.                                           


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  








  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  

                                          


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Reply via email to