Semantic learning from unlabeled text has already been demonstrated and used to improve both text compression (perplexity) and word error rates for speech recognition [1], and pass the word analogy section of the SAT exams [2]. Semantic models exploit the fact that related words like "moon" and "star" tend to appear near each other, forming a fuzzy identity relation.
Syntactic learning is possible from unlabeled text because words with the same grammatical role tend to appear in the same immediate context. For example, "the X is" tells you that X is a noun, allowing you to predict sequences like "a X was". [1] Bellegarda, Jerome R., “Speech recognition experiments using multi-span statistical language models”, IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, 717-720, 1999. [2] Turney, Peter D., Measuring Semantic Similarity by Latent Relational Analysis. In Proceedings Nineteenth International Joint Conference on Artificial Intelligence (IJCAI-05), 1136-1141, Edinburgh, Scotland, 2005. -- Matt Mahoney, [EMAIL PROTECTED] ----- Original Message ---- From: Mark Waser <[EMAIL PROTECTED]> To: agi@v2.listbox.com Sent: Sunday, August 13, 2006 5:25:19 PM Subject: Re: [agi] Marcus Hutter's lossless compression of human knowledge prize >> I think the Hutter prize will lead to a better understading of how we >> learn semantics and syntax. I have to disagree strongly. As long as you a requiring recreation at the bit level as opposed to the semantic or logical level, you aren't going to learn much at all about semantics or syntax (other than, possibly, relative frequency of various constructs which you can then use to *slightly* better optimize -- maybe well enough to win some money but not well enough to win enough to make it worthwhile since it is a definite sidetrack from AGI). Mark ------- To unsubscribe, change your address, or temporarily deactivate your subscription, please go to http://v2.listbox.com/member/[EMAIL PROTECTED]