Re: [agi] Defining "understanding" (was Re: Newcomb's Paradox)

Matt Mahoney Sat, 17 May 2008 13:34:36 -0700

--- Jim Bromer <[EMAIL PROTECTED]> wrote:

> I had said:
> 
> > But this means that you are
> > advancing a purely speculative theory without any evidence to support
> > it.
> 
> Matt said:
> The evidence is described in my paper which you haven't read yet.
> --------------------------------------------
> 
> I did glance at the paper and I don't think I will be able to understand
> your evidence.  Can you give me some clues using plain language.


In speech recognition, it is fairly standard to test the language modeling
component using word perplexity, which is another way of stating the
compression ratio.

More generally, compression measures prediction accuracy.  Given a
question Q, if you can predict the answer A that a human would give, then
you could answer the question.  Expressed as a model, you would assign a
high probability P(A|Q) = P(QA)/P(Q) if A is the "correct" or most likely
answer.  Expressed as a compression ratio, an ideal coder would assign
-log P(A|Q) bits to answer A in context Q, which is smaller for higher
probabilities.

> Matt said:
> For building AGI, my proposal is http://www.mattmahoney.net/agi.html
> Unfortunately, I estimate the cost to be US $1 quadrillion over the next
> 30 years.  But I believe it is coming, because AGI is worth that much. 
> If I use compression anywhere, it will be to evaluate candidate language
> models for peers in a market that right now does not yet exist.
> ---------------------------------------------
> 
> Can you explain what you mean by the statement that you would use
> compression to evaluate candidate language models?

In my competitive message routing (CMR) proposal, peers communicate in
natural language.  A peer is either an expert in some narrow domain, or a
router that knows about other peers and their areas of expertise so that
it can route "related" messages to them.  Or it can be some combination of
these two.

An expert can be a traditional expert system or a database that
understands a small subset of natural language, but more likely it is
simply a fixed collection of documents posted by the peer owner, just as
you would post documents to a website.  In this role, the peer would match
these documents to incoming messages and return the documents to them.  An
expert could also be a human sitting at a computer with a chat client,
typing messages to nobody in particular but always going to someone who
cares.

As a router, a peer caches incoming messages, each of which has a header
identifying its source.  When a new message arrives, it is matched to
cached messages and if there is a close match, then each of the messages
is  sent to the other.  For example, given 2 incoming messages:

  M1: From Alice: something about X.
  M2: From Bob: something else about X.

Then the message M1 from Alice is forwarded to Bob and the message M2 from
Bob is forwarded to Alice.  (This might initiate a conversation between
Alice and Bob).  The peer keeps copies of both messages, so it knows that
Alice and Bob are both interested in X.

The problem is how to decide if two messages are related.  The simplest is
to match terms.  But what we really want to do is match meanings.  A more
sophisticated peer might perform stemming (matching "bank" to "banking"),
a thesaurus (matching "bank" to "finance"), a context based parser (not
matching "banked turn"), or even vision (matching an attached photo of a
bank).

In a CMR network, there is an incentive to be smart about matching
messages.  Information has negative value on average.  If peers used the
simple strategy of forwarding messages to everyone they knew about, other
peers would be flooded with input.  Since there is no central authority,
some peers may do this, so your peer will need to deal with it.  You will
need to block messages from peers that flood you with irrelevant traffic
or spam, or from sources that you can't authenticate.  Likewise you should
forward messages sparingly and only when relevant, lest other peers block
you.  You have an incentive to be a high quality source of information in
a network where peers compete for reputation.

Where I would use compression is to evaluate language models that measure
similarity between messages.  If I know that M1 and M2 are related, then I
want C(M1|M2) + C(M2|M1) to be small, where C(M1|M2) = C(M2,M1) - C(M2) is
the compressed size of M1 in context M2 using the candidate model under
test.


-- Matt Mahoney, [EMAIL PROTECTED]

-------------------------------------------
agi
Archives: http://www.listbox.com/member/archive/303/=now
RSS Feed: http://www.listbox.com/member/archive/rss/303/
Modify Your Subscription: 
http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26
Powered by Listbox: http://www.listbox.com

Re: [agi] Defining "understanding" (was Re: Newcomb's Paradox)

Reply via email to