--- Jim Bromer <[EMAIL PROTECTED]> wrote: > I had said: > > > But this means that you are > > advancing a purely speculative theory without any evidence to support > > it. > > Matt said: > The evidence is described in my paper which you haven't read yet. > -------------------------------------------- > > I did glance at the paper and I don't think I will be able to understand > your evidence. Can you give me some clues using plain language.
In speech recognition, it is fairly standard to test the language modeling component using word perplexity, which is another way of stating the compression ratio. More generally, compression measures prediction accuracy. Given a question Q, if you can predict the answer A that a human would give, then you could answer the question. Expressed as a model, you would assign a high probability P(A|Q) = P(QA)/P(Q) if A is the "correct" or most likely answer. Expressed as a compression ratio, an ideal coder would assign -log P(A|Q) bits to answer A in context Q, which is smaller for higher probabilities. > Matt said: > For building AGI, my proposal is http://www.mattmahoney.net/agi.html > Unfortunately, I estimate the cost to be US $1 quadrillion over the next > 30 years. But I believe it is coming, because AGI is worth that much. > If I use compression anywhere, it will be to evaluate candidate language > models for peers in a market that right now does not yet exist. > --------------------------------------------- > > Can you explain what you mean by the statement that you would use > compression to evaluate candidate language models? In my competitive message routing (CMR) proposal, peers communicate in natural language. A peer is either an expert in some narrow domain, or a router that knows about other peers and their areas of expertise so that it can route "related" messages to them. Or it can be some combination of these two. An expert can be a traditional expert system or a database that understands a small subset of natural language, but more likely it is simply a fixed collection of documents posted by the peer owner, just as you would post documents to a website. In this role, the peer would match these documents to incoming messages and return the documents to them. An expert could also be a human sitting at a computer with a chat client, typing messages to nobody in particular but always going to someone who cares. As a router, a peer caches incoming messages, each of which has a header identifying its source. When a new message arrives, it is matched to cached messages and if there is a close match, then each of the messages is sent to the other. For example, given 2 incoming messages: M1: From Alice: something about X. M2: From Bob: something else about X. Then the message M1 from Alice is forwarded to Bob and the message M2 from Bob is forwarded to Alice. (This might initiate a conversation between Alice and Bob). The peer keeps copies of both messages, so it knows that Alice and Bob are both interested in X. The problem is how to decide if two messages are related. The simplest is to match terms. But what we really want to do is match meanings. A more sophisticated peer might perform stemming (matching "bank" to "banking"), a thesaurus (matching "bank" to "finance"), a context based parser (not matching "banked turn"), or even vision (matching an attached photo of a bank). In a CMR network, there is an incentive to be smart about matching messages. Information has negative value on average. If peers used the simple strategy of forwarding messages to everyone they knew about, other peers would be flooded with input. Since there is no central authority, some peers may do this, so your peer will need to deal with it. You will need to block messages from peers that flood you with irrelevant traffic or spam, or from sources that you can't authenticate. Likewise you should forward messages sparingly and only when relevant, lest other peers block you. You have an incentive to be a high quality source of information in a network where peers compete for reputation. Where I would use compression is to evaluate language models that measure similarity between messages. If I know that M1 and M2 are related, then I want C(M1|M2) + C(M2|M1) to be small, where C(M1|M2) = C(M2,M1) - C(M2) is the compressed size of M1 in context M2 using the candidate model under test. -- Matt Mahoney, [EMAIL PROTECTED] ------------------------------------------- agi Archives: http://www.listbox.com/member/archive/303/=now RSS Feed: http://www.listbox.com/member/archive/rss/303/ Modify Your Subscription: http://www.listbox.com/member/?member_id=8660244&id_secret=103754539-40ed26 Powered by Listbox: http://www.listbox.com
