RE: [agi] Numeric Similarity

Piaget Modeler Thu, 20 Feb 2014 13:59:02 -0800

I've got to play with this now. I'm getting divide by zero errors so I might 
have to shift the range up by five. We'll see. 

; file: num-sim.prm   - Copyright (c) 2014 Michael S P Miller
module test-it
function mean {?sample}  (local ?sum ?len)   (put 0 ?sum (length ?sample) ?len) 
 (for ?n in ?sample (put (+ ?n ?sum) ?sum))  (if (= ?len 0)      (fail 'no data 
points' calculation)   else      (/ ?sum ?len))end 
function sigma {?sample ?mean}  (sqrt (mean (collect ?n in ?sample (exp (- ?n 
?mean) 2))))end 
function z-score {?point ?mean ?sigma}  (/ (- ?point ?mean) ?sigma)end 
function similarity {?a ?b}  (local ?s ?m ?za ?zb ?zm)   (put (mean  {?a ?b})   
 ?m       (sigma {?a ?b} ?m) ?s        (z-score ?a ?m ?s) ?za       (z-score ?b 
?m ?s) ?zb       (mean {?za ?zb})   ?zm)
  (* 0.5 (- (+ (/ ?zm ?za) (/ ?zm ?zb))            (+ (/ (- ?za ?zb) ?za) (/ (- 
?zb za) ?zb))))end 
end ; module test-it
Your comments are appreciated.
~PM
From: [email protected]
To: [email protected]
Subject: RE: [agi] Numeric Similarity
Date: Thu, 20 Feb 2014 12:02:57 -0800




I think we're all making this a lot harder than it should be.   
I don't accept the answer "it can't be done."  Computers can do a lot, even 
this.
Yes, my similarity measure is arbitrary, -1 to +1.  That's correct. 
Sigma gives a number between -4 and +4.  If I divide the sigma by 4, I get the 
rangeI'm looking for. -1 to +1.   That's a step closer.
The z-score tells me where a number lies in the normal distribution. So if I 
divide the z-score by 4 I would get the number in the range -1 to +1.  That's 
another step closer.
I can also say that the mean represents the intersection of two numbers. 
What would represent the "set difference" of two numbers?  Well the delta, (a - 
b) or (b - a), could yield a set difference. It's just a matter of scale.  I 
think if we work with the zero scores of each number then we'd be much closer.  
Let's see...
My general similarity algorithm  for two items a and b is 
(* 0.5 (- intersections - differences)) 
where intersections :=  (+ (/  intersection  a)  (/ intersection b)) where 
differences   := (+ (/ differenceAtoB a) (/ differenceBtoA b))
for lists, intersection is easy, for numbers we can assume it is the mean.
for lists, differenceAtoB is simply set difference, likewise differenceBtoA; 
for numbers we can assume this is simple delta, a - b, or b - a.
Does this get us closer? Possibly, I think we have to use the z-scores though. 
intersection is the mean of the z-scores and difference is one z-score minus 
another.
Let me try this out . . . I'll be back.
~PM
Date: Thu, 20 Feb 2014 14:35:40 -0500
Subject: Re: [agi] Numeric Similarity
From: [email protected]
To: [email protected]

As, I tried to explain, PM, there's no final measure of similarity, even 
between two integers. Integers can be interrelated | compared via potentially 
infinite number of inverse arithmetic operations, each of which gives you match 
(similarity) & miss (difference).
I can only give you a starting point, & a way to proceed from there: The 
simplest comparison is be subtraction, which gives you the simplest absolute 
match: smaller comparand... Your request to quantify similarity between 1 & -1 
is arbitrary, - resolution of match is subset of resolution of comparands, 
which you didn't define. 
But what really matters is a relative match: absolute match compared to a 
higher-level average match. That average is feedback down the hierarchy of 
search (I know you're having difficulty with that concept). And for lists it's 
even more complex, - they *consist* of numbers. 
So, the problem of quantifying similarity is ultimately *the* problem of GI, & 
you shouldn't expect a simple answer to that.     

On Thu, Feb 20, 2014 at 11:57 AM, Piaget Modeler <[email protected]> 
wrote:




Thanks for your response Boris. 
My aim at the moment is to define a function for any two numbers a b.
Similarity(a, b) ::=  c | c in [-1 .. +1].  

Examples:
Similarity(0, 0) = 1.0
Similarity(239420,  239420) = 1.0
Similarity(3.1415926, 3.14) = 0.9995      /* or something close to but less 
than one */ 

Similarity(-7123456789098765, -7123456789098765) = 1.0
And so forth. 


From it I gather, your suggestion, not algorithm, is 
"initial comparison between integers is by subtraction, which compresses miss 
from !AND to difference by cancelling opposite-sign bits, & increases match 
because it’s a complimentary of that reduced difference.
Division will further reduce magnitude of miss by converting it from difference 
to ratio, which can then be reduced again by converting it to logarithm, & so 
on. By reducing miss, higher power of comparison will also increase 
complimentary match. But the costs may grow even faster, for both operations & 
incremental syntax to record incidental sign, fraction, & irrational fraction. 
The power of comparison is increased if current-power match plus miss predict 
an improvement, as indicated by higher-order comparison between results from 
different powers of comparison. Such “meta-comparison” can discover algorithms, 
or meta-patterns."

Similarity(number a, number b) ::= log( (a-b) / ????) 
This seems a bit confusing for me.  
Your thoughts? 
~PM. 

Date: Thu, 20 Feb 2014 09:23:47 -0500

Subject: Re: [agi] Numeric Similarity
From: [email protected]
To: [email protected]


You finally got to a right starting point. This is covered in part 2 of my 
intro: http://www.cognitivealgorithm.info/


 2. Comparison: quantifying match & miss per input.


The purpose of cognition is to predict, & prediction must be quantified. 
Algorithmic information theory defines predictability as compressibility of 
representations, which is perfectly fine. However, current implementations of 
AIT quantify compression only for whole sequences of inputs.


To enable far more incremental selection (& correspondingly scalable search), I 
start by quantifying match between individual inputs. Partial match is a new 
dimension of analysis, additive to binary same | different distinction of 
probabilistic inference. This is analogous to the way probabilistic inference 
improved on classical logic by quantifying partial probability of statements, 
vs binary true | false values.



Individual partial match is compression of magnitude, by replacing larger 
comparand with its difference relative to smaller comparand. In other words, 
match is a complementary of miss, initially equal to the smaller comparand. 
Ultimate criterion is recorded magnitude, rather than record space: bits of 
memory it occupies after compression, because the former represents physical 
impact that we want to predict.



This definition is tautological: smaller comparand = sum of Boolean AND between 
uncompressed (unary code) representations of both comparands, = partial 
identity of these comparands. Some may object that identity also includes the 
case when both comparands or bits thereof equal zero, but that identity also 
equals zero. Again, the purpose here is prediction, which is a representational 
equivalent of conservation in physics. We’re predicting some potential impact 
on the observer, represented by an input. Zero input ultimately means zero 
impact, which has no conservable physical value (inertia), thus no intrinsic 
predictive value.



Given incremental complexity of representation, initial inputs should have 
binary resolution. However, average binary match won’t justify the cost of 
comparison: syntactic overhead of representing new match & miss between 
positionally distinct inputs. Rather, these binary inputs are compressed by 
digitization within a position (coordinate): substitution of every two 
lower-order bits with one higher-order bit within an integer. Resolution of 
that coordinate (input aggregation span) is adjusted to form integers 
sufficiently large to produce (when compared) average match that exceeds 
above-mentioned costs of comparison. These are “opportunity costs“: a 
longer-range average match discovered by equivalent computational resources.



So, the next order of compression is comparison across coordinates, initially 
defined with binary resolution as before | after input. Any comparison is an 
inverse arithmetic operation of incremental power: Boolean AND, subtraction, 
division, logarithm, & so on. Actually, since digitization already compressed 
inputs by AND, comparison of that power won’t further compress resulting 
integers. In general, match is *additive* compression, achieved only by 
comparison of a higher power than that which produced the comparands. Thus, 
initial comparison between integers is by subtraction, which compresses miss 
from !AND to difference by cancelling opposite-sign bits, & increases match 
because it’s a complimentary of that reduced difference.



Division will further reduce magnitude of miss by converting it from difference 
to ratio, which can then be reduced again by converting it to logarithm, & so 
on. By reducing miss, higher power of comparison will also increase 
complimentary match. But the costs may grow even faster, for both operations & 
incremental syntax to record incidental sign, fraction, & irrational fraction. 
The power of comparison is increased if current-power match plus miss predict 
an improvement, as indicated by higher-order comparison between results from 
different powers of comparison. Such “meta-comparison” can discover algorithms, 
or meta-patterns.






On Thu, Feb 20, 2014 at 12:01 AM, Piaget Modeler <[email protected]> 
wrote:




Hi all, 
For all you statisticians out there...
I'm working on an algorithm for numeric similarity and would like to 
crowdsource the solution.


Given two numbers, i.e., two observations, how can I get a score between -1 and 
1 indicating their proximity.
I think I need to compute a few things, 


1. Compute the mean of the observations.2. Compute the standard deviation sigma 
of the observations.3. Compute the z-score of each number. 


Once I know the z-score for each number I knew where each number lies along the 
normal distribution.
After that I'm a little lost.  
Is there a notion of difference or sameness after that. 


This might help..
http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html


Your thoughts are appreciated ? 
Michael Miller.                                           


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  








  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  

                                          


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  








  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  

                                          


  
    
      
      AGI | Archives

 | Modify
 Your Subscription


      
    
  


                                          


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com
RE: [agi] Numeric Similarity

Reply via email to