A few other considerations --  It is possible to reduce the need for
after-the-fact parsing by imposing constraints on the knowledge-entry
process, and this actually makes it easier for users to come up with facts.
For example, the MIT Open Mind Common Sense project asked users to fill in
the blanks of templates like "You would be likely to find _______ in
________," "______ can ______," etc. which the researchers then readily
translated to assertions of the form LocationOf(X, Y), CapableOf(X, Y),
etc.  They ended up collecting quite a few facts -- see
http://web.media.mit.edu/~hugo/conceptnet/<http://web.media.mit.edu/%7Ehugo/conceptnet/>.
If you dig around in the zip file you'll see a few large txt files
that
have all the collected facts listed out -- 200,000 in the "concise" version,
1.6 million in the "full" version.


It seems that either we have to agree on some arbitrary format, or just
leave it as
English (perhaps parsed and disambiguated).

The MIT project I just mentioned did something similar to the "collect in
plain English, then disambiguate afterwards" strategy (a lexically
disambiguated version is also available on their site), but the fact that
the arguments of their predicates are not tied to any kind of formal
semantics really limits their database's utility.  I would strongly
recommend thinking about ways to constrain the users' input in such a way as
to eliminate ambiguity from the getgo.  For example, you could ask users to
create true sentences of the form " if X __1__ __2__ then X __3__ __4__",
but force them to fill in the blanks by selecting from combo boxes that give
them many predefined choices such as "can," "is a," "mouse (the animal),"
"mouse (for a computer)," etc.  This will help standardize your input and
make it a lot easier to work with.

Alternatively, you could write a simple program to create thousands of
randomly generated logical statements and then automatically convert them to
relatively unambiguously phrased English statements (obviously far easier
than converting English to logical form).  Then you could put these to your
web users and have them tell you how true each statement is, or ask them how
they would change false statements to make them true. You could even
incorporate clustering algorithms to infer which unannotated statements are
most likely to be true ... lots of possibilities.

Actually, going with the randomly-generated-logical-statements idea, if you
used Lojban predicates as your underlying logical form, you could
auto-translate them to English and essentially have your users annotating
the truth-values of Lojban sentences without knowing Lojban.  It's very
difficult to auto-translate complex Lojban sentences into readable English
(mainly because Lojban's equivalent of noun compounds are ridiculously
vague), but it shouldn't be too hard to do with simple sentences.

Also, if you can think of any way to turn the knowledge-entry process into a
fun game or competition, go for it.  I've been told by a few people working
on similar projects that making the knowledge-providing process engaging and
fun for visitors ended up being a lot more important (and difficult) than
they'd expected.





On 1/13/07, Joel Pitt < [EMAIL PROTECTED]> wrote:

On 1/14/07, YKY (Yan King Yin) < [EMAIL PROTECTED]> wrote:
> I'm considering this idea:  build a repository of facts/rules in FOL (or
> Prolog) format, similar to Cyc's.  For example "water is wet", "oil is
> slippery", etc.  The repository is structureless, in the sense that it
is
> just a collection of simple statements.  It can serve as raw material
for
> other AGIs, not only mine (although it is especially suitable for my
> system).

Some comments/suggestions:

* I think such a project should make the data public domain. Ignore
silly ideas like giving be "shares" in the knowledge or whatever. It
just complicates things. If the project is really strapped for cash
later, then either use ad revenue or look for research funding
(although I don't see much cost except for initial development of the
system and web hosting).

* Whenever people want to add a new statement, have them evaluate two
existing statements as well. Don't make the evaluation true/false, use
a slider so the user can decide how "true" it is (even better, have a
xy chart with one axis true/false and the other how sure the user is -
this would be useful in the case of some obscure fact on quantum
physics since not all of us have the answer).

* Emphasize the community aspect of the database. Allow people to have
profiles and list the number of statements evaluated and submitted
(also how true the statements they submit are judged). Allow people to
form teams. Allow teams to extract a subset of the data
which represents only the facts they've submitted and evaluated
(perhaps this could be an extra feature available to sponsors?)

* Although Lojban would be great to use, not many people are
proficient it (relative to english), we could be idealistic and
suggest that everyone learn lojban before submitting statements, but
that would just shrink the user base and kill the community aspect. An
alternative might be to allow statements in both languages to
submitted (Hell, why not allow ANY language as long as it is tagged
with what language it is).

* An idea for keeping the community alive would be to focus on a
particular topic each week, and run competitions between
teams/individuals and award stars to their profile or something.

* Instead of making people come up with brand new statements
everytime, have a mode where the system randomly selects phrases from
somewhere like wikipedia (some times this will produce stupid
statements, and allow the user to indicate as such).

I think it could be done and made quite fun. Don't just focus on the
AI guys, most of us don't have that much spare time. Focus at the
"bored at work" market.

Actually going through and thinking about this has made me quite
enthused about it. Keep me posted on how it pans out. If I didn't have
10 other projects and my PhD to do I'd volunteer to code it.

--
-Joel

"Unless you try to do something beyond what you have mastered, you
will never grow." -C.R. Lawton

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303


-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?list_id=303

Reply via email to