On Sat, Jan 24, 2015 at 11:59 AM, Piaget Modeler via AGI <[email protected]> wrote: > How do you represent IS ? Do you differentiate IS from TYPE-OF (i.e., IS-A), > or INSTANCE-OF ? > > Take for example, > > IS(apple, fruit) - TYPE-OF > IS(John_Smith, Politician) - INSTANCE-OF > IS(my_coat, green) - ???
IS could also be Islamic State. Language evolves. Knowledge representation systems that assign a fixed set of meanings to words have a long history of failure. I don't know why anyone still pursues this approach. I understand that a structured knowledge representation doesn't require a supercomputer like a neural language model. Initially it looks like the right approach too, because rule coverage has a power law distribution, with the IS-A construct ranked right at the top. You can cover half of the language with just a few hundred rules. The problem is that nobody knows how many rules you need to cover the other half. Doug Lenat (Cyc) has been plugging away at it for over 30 years. Apparently it was a lot more than he thought. First, our brains evolved to be able to learn language. Then language evolved to have a structure that can be learned in a few years on a noisy, massively parallel 10 petaflop computer with 100 terabits of memory. The rules (I believe there are 10^8 to 10^9 of them) can be grouped roughly into lexical, semantics, and grammar. Rules in each set can be learned after learning a large portion of the previous set. Note that I listed semantics before grammar, which is the opposite of the way most parsers work (or actually, don't work). Children learn the rules for splitting continuous text into words by age 7 to 10 months. They learn to associate words with other words and with nonverbal perceptions (grounding) starting around 1 year. They start forming grammatically correct sentences around age 2-3. We can divide grammar rules into categorization (X is a noun) and rules for ordering words (adjectives precede nouns in English). Most rules are very specific. For example, we say "salt and pepper", not "pepper and salt". We use high level grammar rules to solve math problems, so there is an obvious learning hierarchy here too. I am not sure how much this helps. Most of us don't have the resources to do the 10^24 operations needed to properly learn natural language, other than in our own brains. We usually compromise and do something we can afford, but there is an obvious tradeoff between CPU, memory, and text prediction accuracy which I have documented at http://mattmahoney.net/dc/text.html A highly optimized program running for a week on a high end desktop with 32 GB of memory still falls well short of what humans can do. -- -- Matt Mahoney, [email protected] ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
