Ben G. What ratio of the numbers in the sample S are greater than max(x,y) or less than min(x,y) , as opposed to lying between x and y? Thanks Ben, these are good suggestions. I will consider. ~PM / Michael.
Date: Fri, 21 Feb 2014 07:46:17 +0800 Subject: Re: [agi] Numeric Similarity From: [email protected] To: [email protected] Do you mean: Given two numbers x and and y drawn from a specific sample S of numbers (or a specific probability distribution D over the set of numbers)? Without this background S or D, the question is meaningless... Given a distribution D, one can draw a sample S, of course; so the case where one has a sample S is sufficient to deal with One sensible measure would be: What ratio of the numbers in the sample S are greater than max(x,y) or less than min(x,y) , as opposed to lying between x and y? This gives you 1 if x and y are identical or have no members of S between them; and 0 if x and y are the opposite endpoints of the sample. If you want a scaling between -1 and 1 instead of 0 and 1, just linearly normalize... In OpenCog, this (but without the normalization into [-1,1]) is how we would measure similarity between two NumberNodes relative to a given QuantitativeSchemaNode, consistent with our approach of quantile normalization for predicatizing quantitative characters: http://wiki.opencog.org/w/QuantitativePredicate An advantage of ranking based approaches like these, is that they are robust with respect to the wide variety of different probability distributions one encounters in the real world... -- Ben G On Thu, Feb 20, 2014 at 1:01 PM, Piaget Modeler <[email protected]> wrote: Hi all, For all you statisticians out there... I'm working on an algorithm for numeric similarity and would like to crowdsource the solution. Given two numbers, i.e., two observations, how can I get a score between -1 and 1 indicating their proximity. I think I need to compute a few things, 1. Compute the mean of the observations.2. Compute the standard deviation sigma of the observations.3. Compute the z-score of each number. Once I know the z-score for each number I knew where each number lies along the normal distribution. After that I'm a little lost. Is there a notion of difference or sameness after that. This might help.. http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html Your thoughts are appreciated ? Michael Miller. AGI | Archives | Modify Your Subscription -- Ben Goertzel, PhD http://goertzel.org "In an insane world, the sane man must appear to be insane". -- Capt. James T. Kirk AGI | Archives | Modify Your Subscription ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
