Re: [agi] Subtraction is the Engine of Computation

Boris Kazachenko Fri, 17 Aug 2012 12:49:11 -0700

Jim: Probabilistic reasoning has to be based on a frame of reference which 
defines, implicitly or explicitly, the necessary characteristics of what is 
being evaluated.

Boris: Right, but "defines" above may mean two things: modalities of original 
inputs that it measures, & the types of relationships among these inputs that 
it can discover: derivatives in my terms. Defining right modalities is 
important, but we already know that vision accounts for most of our data, & can 
indirectly discover all other modalities. The purpose of a GI algorithm is to 
discover relationships, & I suggest that they all can be reduced to / derived 
from atomic match & miss.    

Jim: So I happily agree with you. Just the fact that someone can act like he 
understands this seems like a novel experience to me.

Boris: Same here. 

> Boris: In reality, expectations are rarely matched or missed precisely, so 
> the degree of confirmation must be quantified for individual events. 
> Quantifying partial match would add a micro-grayscale to the binary value of 
> events in Bayesian prediction, just like the latter added macro-grayscale 
> (partial probability) to binary (true| false) predictions of classical logic.

Jim: Right, but you also have to define how these confirmations may be 
confirmed.

Boris: "Confirmation" is quantified as match. And of course, on higher levels 
of search you will have match of a match, & so on. It's the same algorithm, 
applied to incrementally higher-syntax data. 

> Boris: Besides, the events are assumed to be high-level concepts, the kind 
> that occupy our conscious minds. But a scalable search algorithm must start 
> from sensory data processing that is subconscious for us, rather than depend 
> on human preprocessing. So, the choice of such initial inputs for BI & AIT 
> already shows a total lack of discipline in incrementing complexity: a fatal 
> fault for any attempt at scalability."

Jim: I think you are missing something subtle. You cannot build knowledge on 
sensory data alone. You must also rely on the meaning that can be derived from 
that data. 

Boris: Of course, - my approach is hierarchical. Higher levels do "rely" on 
"meaning" derived on the lower levels. I think I already mentioned "incremental 
syntax" about a hundred times.

Jim: This entails a kind of jumping around which can be likened to correlation 
(in its broadest sense). You are trying to solve for meaning by ruling it out 
as a useful input into reasoning. You may try defining raw sensory data 
processing as perception or pre-perception but these sub-categories can become 
illusions when you are wishing to understand how meaning is derived. 

Boris: Enlighten me, how?

Jim: There is too much sensory data for it to be used as the direct basis for 
all insight, the data is noisy compared to what is seen as important, and there 
are few one-to-one associations between elementary sensation and insightful 
meaning.

Boris: Cognition that produced our "meanings" is a long process of both 
individual, & especially civilization-wide, learning. I know it's hard to 
"visualize" how we got from something so simple & noisy, to all the science, 
technology, & social interactions of modern civilization. But our ability to 
learn is obviously innate, we only have 23K genes, & almost all of them have 
nothing to do with GI per se. Cavemen started with simple inputs & simple 
algorithms, all the rest must've been derived from them.  

Jim: By the way, I don't really agree that probabilistic / information theory 
methods can be used as the basis of AGI. 

Boris: Right, you want to use "semantics" & "meaning", but don't really know 
what that is. If you think about where we ultimately get these things from,  - 
it must be either original modalities, or the cognitive algorithm itself.  

Jim: However, I am interested in what you are saying and I am very curious 
about some of the other details that you have mentioned so I do want to talk 
about this some more.

Boris: I am all ears.  

http://www.cognitivealgorithm.info/2012/01/cognitive-algorithm.html 

Boris: Bayesian (probabilistic) inference should be built on evaluation of 
similarity (partial match) between individual inputs.

Jim: Probabilistic reasoning has to be based on a frame of reference which 
defines, implicitly or explicitly, the necessary characteristics of what is 
being evaluated.  So I happily agree with you.  Just the fact that someone can 
act like he understands this seems like a novel experience to me.

Boris:  In reality, expectations are rarely matched or missed precisely, so the 
degree of confirmation must be quantified for individual events. Quantifying 
partial match would add a micro-grayscale to the binary value of events in 
Bayesian prediction, just like the latter added macro-grayscale (partial 
probability) to binary (true| false) predictions of classical logic.

Jim: Right, but you also have to define how these confirmations may be 
confirmed.

Boris:  I define match for comparisons among individual inputs.
...a scalable search algorithm must start from sensory data processing...

Jim: This is ok, but,

Boris: Besides, the events are assumed to be high-level concepts, the kind that 
occupy our conscious minds. But a scalable search algorithm must start from 
sensory data processing that is subconscious for us, rather than depend on 
human preprocessing. So, the choice of such initial inputs for BI & AIT already 
shows a total lack of discipline in incrementing complexity: a fatal fault for 
any attempt at scalability."

Jim:  I think you are missing something subtle.  You cannot build knowledge on 
sensory data alone.  You must also rely on the meaning that can be derived from 
that data. This entails a kind of jumping around which can be likened to 
correlation (in its broadest sense).  You are trying to solve for meaning by 
ruling it out as a useful input into reasoning.  You may try defining raw 
sensory data processing as perception or pre-perception but these 
sub-categories  can become illusions when you are wishing to understand how 
meaning is derived.  There is too much  sensory data for it to be used as the 
direct basis for all insight, the data is noisy compared to what is seen as 
important, and there are few one-to-one associations between elementary 
sensation and insightful meaning.

By the way, I don't really agree that probabilistic / information theory 
methods can be used as the basis of AGI.  However, I am interested in what you 
are saying and I am very curious about some of the other details that you have 
mentioned so I do want to talk about this some more.

Jim Bromer

On Sat, Aug 11, 2012 at 9:41 PM, Boris Kazachenko <bori...@verizon.net> wrote:

  Boris: "AIT quantifies compression for sequences of inputs, while I define 
match for comparisons among individual inputs. On this level, a match is a 
lossless compression by replacing a larger comparand with its derivative 
(miss), relative to the smaller comparand. In other words, a match a 
complementary of a miss. That’s a deeper level of analysis, which I think can 
enable a far more incremental (thus potentially scalable) approach.

  Jim: You are talking about an evaluation method that is derived from (or 
built on the scaffolding of) Bayesian Reasoning right?

  Boris: 

  No, it's the other way around, Bayesian (probabilistic) inference should be 
built on evaluation of similarity (partial match) between individual inputs. 
The fact that it isn't is (to me) a fatal flaw of the former. Any probability 
is estimated from (& for) a sequence of instances, quantifying partial match 
(vs. assuming binary presence | absence) for each instance increases the depth 
of analysis by a whole new dimension. My intro, part 7:      

  "Two other approaches close to mine are Algorithmic information theory & 
Bayesian inference, which use the same criteria as mine: compression & 
prediction. A good introduction is 

  Philosophical Treatise of Universal Induction by S. Rathmanner & M. Hutter.
  While a progress vs. a static “frequentist” probability, BI & AIT still 
assume a “prior”, which doesn’t belong in a consistently inductive approach. To 
generalize it, Solomonoff introduced universal prior: “a class of all models“. 
A priori infinity of this class means that he hits combinatorial explosion even 
*before* receiving actual inputs, - “solution” that only a mathematician may 
find interesting. In my approach, the models are simply past inputs & 
correlations among them. Environmentally specific priors could speed-up 
learning, but a general pattern discovery algorithm must be the core on which 
such short-cuts are added or removed from.

  Also perverse is binary resolution of initial inputs in BI & AIT: 
confirmation / disconfirmation events. In reality, expectations are rarely 
matched or missed precisely, so the degree of confirmation must be quantified 
for individual events. Quantifying partial match would add a micro-grayscale to 
the binary value of events in Bayesian prediction, just like the latter added 
macro-grayscale (partial probability) to binary (true| false) predictions of 
classical logic.
  Besides, the events are assumed to be high-level concepts, the kind that 
occupy our conscious minds. But a scalable search algorithm must start from 
sensory data processing that is subconscious for us, rather than depend on 
human preprocessing. So, the choice of such initial inputs for BI & AIT already 
shows a total lack of discipline in incrementing complexity: a fatal fault for 
any attempt at scalability."

  On Wed, Aug 8, 2012 at 10:21 AM, Boris Kazachenko <bori...@verizon.net> wrote:

    Jim,

    I agree with your focus on binary computational compression, but, as you 
said, that efficiency depends on specific operands. Even though low-power 
operations (addition) are more efficient for most data, it's the exceptions 
that matter. Most data is noise, what we care about is patterns. So, to improve 
both representational & computational compression, we need to quantify it for 
each operand ) operation. And the atomic operation that quantifies compression 
is what I call comparison, which starts with an inverse, vs. direct arithmetic 
operation. This reflects on our basic disagreement, - you (& most logicians, 
mathematicians, & programmers) start from deduction / pattern projection, which 
is based on direct operations. And I think real GI must start from induction / 
pattern discovery, which is intrinsically an inverse operation.  It's pretty 
dumb to generate / project patterns at random, vs. first discovering them in 
the real world & projecting accordingly.

    This is how I proposed to quantify compression (pattern strength) in my 
intro, part 2: 

    "AIT quantifies compression for sequences of inputs, while I define match 
for comparisons among individual inputs. On this level, a match is a lossless 
compression by replacing a larger comparand with its derivative (miss), 
relative to the smaller comparand. In other words, a match a complementary of a 
miss. That’s a deeper level of analysis, which I think can enable a far more 
incremental (thus potentially scalable) approach.

    Given incremental complexity of representation, initial inputs should have 
binary resolution. However, average binary match won’t justify the cost of 
comparison, which adds a syntactic overhead of newly differentiated match & 
miss to positionally distinct inputs. Rather, these binary inputs are 
compressed by digitization: a selective carry, aggregated & then forwarded up 
the hierarchy of digits. This is analogous to hierarchical search, explained in 
the next chapter, where selected templates are compared & conditionally 
forwarded up the hierarchy of expansion levels: a “digital hierarchy” of a 
corresponding coordinate. Digitization is done on inputs within a shared 
coordinate, the resolution of which is adjusted by feedback. This resolution 
must form average integers that are large enough for an average match between 
them (a subset of their magnitude) to merit the above-mentioned costs of 
comparison.

    Hence, the next order of compression is comparison across coordinates 
(initially defined with binary resolution as before | after input). Any 
comparison is an inverse arithmetic operation of incremental power: Boolean 
AND, subtraction, division, logarithm, & so on. Binary match is a sum of AND: 
partial identity of uncompressed bit strings, & miss is !AND. Binary comparison 
is useful for digitization, but it won’t further compress the integers produced 
thereby. In general, the products of a given-power comparison are further 
compressed only by a higher-power comparison between them, where match is the 
*additive* compression.

    Thus, initial comparison between digitized integers is done by subtraction, 
which increases match by compressing miss from !AND to difference, in which 
opposite-sign bits cancel each other via carry | borrow. The match is increased 
because it is a complimentary of difference, equal to the smaller of the 
comparands.

    All-to-all comparison across 1D queue of pixels forms signed derivatives, 
complemented by which new inputs can losslessly & compressively replace older 
templates. At the same time, current input match determines whether individual 
derivatives are also compared (vs. aggregated), forming successively higher 
derivatives. “Atomic” comparison is between a single-variable input & a 
template (older input):
    Comparison: match= min (input, template), miss= dif (i-t): aggregated over 
the span of constant sign.
    Evaluation: match - average_match_per_average_difference_match, formed on 
the next search level.
    This evaluation is for comparing higher derivatives, vs. evaluation for 
higher-level inputs explained in part 3. It can also be increasingly complex, 
but I will need a meaningful feedback to elaborate.

    Division further reduces difference to a ratio, which can then be reduced 
to a logarithm, & so on. Thus, complimentary match is increased with the power 
of comparison. But the costs may grow even faster, for both operations & 
incremental syntax to record incidental sign, fraction, irrational fraction. 
The power of comparison is increased if current match plus miss predict an 
improvement, as indicated by higher-order comparison between the results from 
different powers of comparison. This meta-comparison can discover algorithms, 
or meta-patterns..."

    http://www.cognitivealgorithm.info/2012/01/cognitive-algorithm.html 

        AGI | Archives  | Modify Your Subscription   

        AGI | Archives  | Modify Your Subscription   

      AGI | Archives  | Modify Your Subscription   

-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-c97d2393
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-2484a968
Powered by Listbox: http://www.listbox.com

Re: [agi] *Subtraction* is the Engine of Computation

Reply via email to

Re: [agi] Subtraction is the Engine of Computation