You finally got to a right starting point. This is covered in part 2 of my intro: http://www.cognitivealgorithm.info/
*2. Comparison: quantifying match & miss per input.* The purpose of cognition is to predict, & prediction must be quantified. Algorithmic information theory defines predictability as compressibility of representations, which is perfectly fine. However, current implementations of AIT quantify compression only for whole sequences of inputs. To enable far more incremental selection (& correspondingly scalable search), I start by quantifying match between individual inputs. Partial match is a new dimension of analysis, additive to binary same | different distinction of probabilistic inference. This is analogous to the way probabilistic inference improved on classical logic by quantifying partial probability of statements, vs binary true | false values. Individual partial match is compression of magnitude, by replacing larger comparand with its difference relative to smaller comparand. In other words, match is a complementary of miss, initially equal to the smaller comparand. Ultimate criterion is recorded magnitude, rather than record space: bits of memory it occupies after compression, because the former represents physical impact that we want to predict. This definition is tautological: smaller comparand = sum of Boolean AND between uncompressed (unary code) representations of both comparands, = partial identity of these comparands. Some may object that identity also includes the case when both comparands or bits thereof equal zero, but that identity also equals zero. Again, the purpose here is prediction, which is a representational equivalent of conservation in physics. We're predicting some potential impact on the observer, represented by an input. Zero input ultimately means zero impact, which has no conservable physical value (inertia), thus no intrinsic predictive value. Given incremental complexity of representation, initial inputs should have binary resolution. However, average binary match won't justify the cost of comparison: syntactic overhead of representing new match & miss between positionally distinct inputs. Rather, these binary inputs are compressed by digitization within a position (coordinate): substitution of every two lower-order bits with one higher-order bit within an integer. Resolution of that coordinate (input aggregation span) is adjusted to form integers sufficiently large to produce (when compared) average match that exceeds above-mentioned costs of comparison. These are "opportunity costs": a longer-range average match discovered by equivalent computational resources. So, the next order of compression is comparison across coordinates, initially defined with binary resolution as before | after input. Any comparison is an inverse arithmetic operation of incremental power: Boolean AND, subtraction, division, logarithm, & so on. Actually, since digitization already compressed inputs by AND, comparison of that power won't further compress resulting integers. In general, match is *additive* compression, achieved only by comparison of a higher power than that which produced the comparands. Thus, initial comparison between integers is by subtraction, which compresses miss from !AND to difference by cancelling opposite-sign bits, & increases match because it's a complimentary of that reduced difference. Division will further reduce magnitude of miss by converting it from difference to ratio, which can then be reduced again by converting it to logarithm, & so on. By reducing miss, higher power of comparison will also increase complimentary match. But the costs may grow even faster, for both operations & incremental syntax to record incidental sign, fraction, & irrational fraction. The power of comparison is increased if current-power match plus miss predict an improvement, as indicated by higher-order comparison between results from different powers of comparison. Such "meta-comparison" can discover algorithms, or meta-patterns. On Thu, Feb 20, 2014 at 12:01 AM, Piaget Modeler <[email protected]>wrote: > Hi all, > > For all you statisticians out there... > > I'm working on an algorithm for numeric similarity and would like to > crowdsource the solution. > > Given two numbers, i.e., two observations, how can I get a score between > -1 and 1 indicating their proximity. > > I think I need to compute a few things, > > 1. Compute the *mean* of the observations. > 2. Compute the standard deviation *sigma* of the observations. > 3. Compute the *z-score* of each number. > > Once I know the z-score for each number I knew where each number lies > along the normal distribution. > > After that I'm a little lost. > > Is there a notion of difference or sameness after that. > > This might help.. > > > http://www.dkv.columbia.edu/demo/medical_errors_reporting/site010708/module3/0510-similar-numeric.html > > Your thoughts are appreciated ? > > Michael Miller. > *AGI* | Archives <https://www.listbox.com/member/archive/303/=now> > <https://www.listbox.com/member/archive/rss/303/18407320-d9907b69> | > Modify<https://www.listbox.com/member/?&>Your Subscription > <http://www.listbox.com> > ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
