Semantic learning from unlabeled text has already been demonstrated and used to 
improve both text compression (perplexity) and word error rates for speech 
recognition [1], and pass the word analogy section of the SAT exams [2].  
Semantic models exploit the fact that related words like "moon" and "star" tend 
to appear near each other, forming a fuzzy identity relation.

Syntactic learning is possible from unlabeled text because words with the same 
grammatical role tend to appear in the same immediate context.  For example, 
"the X is" tells you that X is a noun, allowing you to predict sequences like 
"a X was".

[1] Bellegarda, Jerome R., “Speech recognition experiments using multi-span 
statistical language models”, IEEE Intl. Conf. on Acoustics, Speech, and Signal 
Processing, 717-720, 1999.

[2] Turney, Peter D., Measuring Semantic Similarity by Latent Relational 
Analysis. In Proceedings Nineteenth International Joint Conference on 
Artificial Intelligence (IJCAI-05), 1136-1141, Edinburgh, Scotland, 2005.   
-- Matt Mahoney, [EMAIL PROTECTED]

----- Original Message ----
From: Mark Waser <[EMAIL PROTECTED]>
To: agi@v2.listbox.com
Sent: Sunday, August 13, 2006 5:25:19 PM
Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize

>> I think the Hutter prize will lead to a better understading of how we 
>> learn semantics and syntax.

I have to disagree strongly.  As long as you a requiring recreation at the 
bit level as opposed to the semantic or logical level, you aren't going to 
learn much at all about semantics or syntax (other than, possibly, relative 
frequency of various constructs which you can then use to *slightly* better 
optimize -- maybe well enough to win some money but not well enough to win 
enough to make it worthwhile since it is a definite sidetrack from AGI).

    Mark




-------
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to http://v2.listbox.com/member/[EMAIL PROTECTED]

Reply via email to