A few other considerations -- It is possible to reduce the need for after-the-fact parsing by imposing constraints on the knowledge-entry process, and this actually makes it easier for users to come up with facts. For example, the MIT Open Mind Common Sense project asked users to fill in the blanks of templates like "You would be likely to find _______ in ________," "______ can ______," etc. which the researchers then readily translated to assertions of the form LocationOf(X, Y), CapableOf(X, Y), etc. They ended up collecting quite a few facts -- see http://web.media.mit.edu/~hugo/conceptnet/<http://web.media.mit.edu/%7Ehugo/conceptnet/>. If you dig around in the zip file you'll see a few large txt files that have all the collected facts listed out -- 200,000 in the "concise" version, 1.6 million in the "full" version.
It seems that either we have to agree on some arbitrary format, or just
leave it as
English (perhaps parsed and disambiguated).
The MIT project I just mentioned did something similar to the "collect in plain English, then disambiguate afterwards" strategy (a lexically disambiguated version is also available on their site), but the fact that the arguments of their predicates are not tied to any kind of formal semantics really limits their database's utility. I would strongly recommend thinking about ways to constrain the users' input in such a way as to eliminate ambiguity from the getgo. For example, you could ask users to create true sentences of the form " if X __1__ __2__ then X __3__ __4__", but force them to fill in the blanks by selecting from combo boxes that give them many predefined choices such as "can," "is a," "mouse (the animal)," "mouse (for a computer)," etc. This will help standardize your input and make it a lot easier to work with. Alternatively, you could write a simple program to create thousands of randomly generated logical statements and then automatically convert them to relatively unambiguously phrased English statements (obviously far easier than converting English to logical form). Then you could put these to your web users and have them tell you how true each statement is, or ask them how they would change false statements to make them true. You could even incorporate clustering algorithms to infer which unannotated statements are most likely to be true ... lots of possibilities. Actually, going with the randomly-generated-logical-statements idea, if you used Lojban predicates as your underlying logical form, you could auto-translate them to English and essentially have your users annotating the truth-values of Lojban sentences without knowing Lojban. It's very difficult to auto-translate complex Lojban sentences into readable English (mainly because Lojban's equivalent of noun compounds are ridiculously vague), but it shouldn't be too hard to do with simple sentences. Also, if you can think of any way to turn the knowledge-entry process into a fun game or competition, go for it. I've been told by a few people working on similar projects that making the knowledge-providing process engaging and fun for visitors ended up being a lot more important (and difficult) than they'd expected. On 1/13/07, Joel Pitt < [EMAIL PROTECTED]> wrote:
On 1/14/07, YKY (Yan King Yin) < [EMAIL PROTECTED]> wrote: > I'm considering this idea: build a repository of facts/rules in FOL (or > Prolog) format, similar to Cyc's. For example "water is wet", "oil is > slippery", etc. The repository is structureless, in the sense that it is > just a collection of simple statements. It can serve as raw material for > other AGIs, not only mine (although it is especially suitable for my > system). Some comments/suggestions: * I think such a project should make the data public domain. Ignore silly ideas like giving be "shares" in the knowledge or whatever. It just complicates things. If the project is really strapped for cash later, then either use ad revenue or look for research funding (although I don't see much cost except for initial development of the system and web hosting). * Whenever people want to add a new statement, have them evaluate two existing statements as well. Don't make the evaluation true/false, use a slider so the user can decide how "true" it is (even better, have a xy chart with one axis true/false and the other how sure the user is - this would be useful in the case of some obscure fact on quantum physics since not all of us have the answer). * Emphasize the community aspect of the database. Allow people to have profiles and list the number of statements evaluated and submitted (also how true the statements they submit are judged). Allow people to form teams. Allow teams to extract a subset of the data which represents only the facts they've submitted and evaluated (perhaps this could be an extra feature available to sponsors?) * Although Lojban would be great to use, not many people are proficient it (relative to english), we could be idealistic and suggest that everyone learn lojban before submitting statements, but that would just shrink the user base and kill the community aspect. An alternative might be to allow statements in both languages to submitted (Hell, why not allow ANY language as long as it is tagged with what language it is). * An idea for keeping the community alive would be to focus on a particular topic each week, and run competitions between teams/individuals and award stars to their profile or something. * Instead of making people come up with brand new statements everytime, have a mode where the system randomly selects phrases from somewhere like wikipedia (some times this will produce stupid statements, and allow the user to indicate as such). I think it could be done and made quite fun. Don't just focus on the AI guys, most of us don't have that much spare time. Focus at the "bored at work" market. Actually going through and thinking about this has made me quite enthused about it. Keep me posted on how it pans out. If I didn't have 10 other projects and my PhD to do I'd volunteer to code it. -- -Joel "Unless you try to do something beyond what you have mastered, you will never grow." -C.R. Lawton ----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
----- This list is sponsored by AGIRI: http://www.agiri.org/email To unsubscribe or change your options, please go to: http://v2.listbox.com/member/?list_id=303
