Re: [agi] Lossy & lossless compressi

Mark Waser Mon, 28 Aug 2006 13:54:01 -0700

I think a 1 GB corpus is big enough to learn most of this knowledge usingstatistical methods.So we know that "obese" occurs in about 0.001% of all paragraphs, but in1% of paragraphs containing "fat".

OK. Now try "obese" and "morbidly" or "obese" and "clinically". I suspectthat you are far more likely to statistically end up with "obese" being someform a disease (that being the context where is normally used) than it is toend up as "fat". Statistical methods get absolutely trashed when you startswitching contexts unless they can tell (or more likely, are told) thatyou've switched contexts. They are great at pulling context-specificclusters out of specific contexts but unless you get cross-contextexplanatory data (that you'll probably interpret with "other thanstatistical methods -- see next section"), I don't believe that statisticalmethods will recognize obese and fat as synonyms.

Likewise, syntax is learnable. For example, if you encounter "the X is"you know that X is a noun, so you can predict "a X was" or "Xs" ratherthan "he X" or "Xed". This type of knowledge can be exploited usingsimilarity modeling [3] to improve word preplexity.Let me give one more example using the same learning mechanism by whichsyntax is learned:
All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All insects have 6 legs.  Ants are insects.  Therefore ants have 6 legs.
Now predict: All frogs are green.  Kermit is a frog.  Therefore...

This isn't a statistical method (see "other than statistical methods" above:-).


= = = = =

So -- No, I *don't* believe that the 1GB corpus is big enough to learn mostof this knowledge *USING STATISTICAL METHODS*. I *do* believe that it islarge enough for other methods though.

----- Original Message -----From: "Matt Mahoney" <[EMAIL PROTECTED]>

To: <[email protected]>
Sent: Monday, August 28, 2006 3:37 PM
Subject: Re: [agi] Lossy *&* lossless compressi

On 8/28/06, Mark Waser  wrote:
How does a lossless model observe that "Jim is  extremely fat" and "James
continues to be morbidly obese" are approximately  equal?
I realize this is far beyond the capabilities of current data compressionprograms, which typically predict the next byte in the context of the lastfew bytes using learned statistics. Of course we must do better. Themodel has to either know, or be able to learn, the relationships between"Jim" and "James", "is" and "continues to be", "fat" and "obese", etc. Ithink a 1 GB corpus is big enough to learn most of this knowledge usingstatistical methods.
C:\res\data\wiki>grep -c . enwik9
File enwik9:
10920493 lines match
enwik9: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i -c " fat " enwik9
File enwik9:
1312 lines match
enwik9: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i -c " obese " enwik9
File enwik9:
111 lines match
enwik9: grep: input lines truncated - result questionable

C:\res\data\wiki>grep -i " obese " enwik9 |grep -c " fat "
File STDIN:
14 lines match
So we know that "obese" occurs in about 0.001% of all paragraphs, but in1% of paragraphs containing "fat". This is an example of a distant bigrammodel, which has been shown to improve word perplexity in offline models[1]. We can improve on this method using e.g. latent semantic analysis[2] to exploit the transitive property of semantics: if A appears near(means) B and B appears near C, then A predicts C.
Likewise, syntax is learnable. For example, if you encounter "the X is"you know that X is a noun, so you can predict "a X was" or "Xs" ratherthan "he X" or "Xed". This type of knowledge can be exploited usingsimilarity modeling [3] to improve word preplexity. (Thanks to RobFreeman for pointing me to this).
Let me give one more example using the same learning mechanism by whichsyntax is learned:
All men are mortal.  Socrates is a man.  Therefore Socrates is mortal.
All insects have 6 legs.  Ants are insects.  Therefore ants have 6 legs.

Now predict: All frogs are green.  Kermit is a frog.  Therefore...
[1] Rosenfeld, Ronald, "A Maximum Entropy Approach to Adaptive StatisticalLanguage Modeling", Computer, Speech and Language, 10, 1996.
[2] Bellegarda, Jerome R., "Speech recognition experiments usingmulti-span statistical language models", IEEE Intl. Conf. on Acoustics,Speech, and Signal Processing, 717-720, 1999.
[3] Ido Dagan, Lillian Lee, Fernando C. N. Pereira, Similarity-BasedModels of Word Cooccurrence Probabilities, Machine Learning, 1999.http://citeseer.ist.psu.edu/dagan99similaritybased.html
-- Matt Mahoney, [EMAIL PROTECTED]




-------
To unsubscribe, change your address, or temporarily deactivate yoursubscription,
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]



-------

To unsubscribe, change your address, or temporarily deactivate your subscription,please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Re: [agi] Lossy *&* lossless compressi

Reply via email to

Re: [agi] Lossy & lossless compressi