A possibly viable approach is to simply parse the Web, and throw out sentences that one's system can't interpret confidently enough. The hope is that there is enough information online in suitably simple form, to fill up an artificial mind.

That's part of the idea. Another part of the idea is -- when you have something that the system can't disambiguate, it actively goes out and searches for other relevant documents and sees if they help it resolve the problem. Yet another part of the idea is having a good inheritance structure down to Simple English (which gives the system access to *a lot* more data and potential rules). And, of course, there is always the human input element --> Since I also plan to be using the same framework and much of the same software to build an argument resolution system, the system will have a lot of opportunity to interact with humans in a very structured way (with the structure provided by the system in the form of limited actions and questions with options -- not requiring the structure to be provided by the humans).

This is essentially PowerSet's plan, as I understand it: to parse the Web and map it into logic, but using a parser that can only handle relatively simple syntactic constructs reliably. The redundancy of the Web is being relied upon.

I'd never heard of PowerSet before. Thank you. My suspicion is that PowerSet may run into a scaling problem before their software becomes useful (unless, of course, they design a good set of "forgetting" algorithms in . . . . hmmmm :-).

This may well work to get a whole bunch of knowledge into the system, but there is a lot of stuff it will never get, too

I'm not convinced of this -- particularly as the system becomes more robust. Preposition disambiguation is "badly unsolved" because (I believe) it requires domain knowledge to do effectively and people are trying to do it without domain knowledge. The same is true of reference resolution. I think that these things are eminently soluble once you have the domain knowledge wired in (and I think that you can bootstrap enough of the domain knowledge with the simple stupid parser, active search, and inheritance).

if PowerSet or Mark Waser or anyone else builds up an awesome repository of knowledge via savvy NLP, I'd be very happy to partner with them and ingest that knowledge into Novamente.

There are *many* things that I'm NOT going after that I believe that Novamente would be ideal to solve. Trust me, if I get anywhere, I'm going to be knocking on your door . . . . :-)


----- Original Message ----- From: "Benjamin Goertzel" <[EMAIL PROTECTED]>
To: <[email protected]>
Sent: Tuesday, April 17, 2007 10:04 PM
Subject: Re: [agi] AGI interests


Mark Waser wrote:
    To me, it seems much easier to have an automated system encode the
facts/rules.

Indeed ... but obviously, no one has solved the NLP problem... there
are no semantic analysis systems out there that can map complex
sentences into logical relationships in an adequately reliable way...

For instance, the problem of preposition disambiguation is very badly
unsolved...

Current reference resolution algorithms are pretty unreliable too...

There are plenty of parsing/semantic-mapping systems that work well
for most simple sentences, but that doesn't really help in the
interpretation of the real text that's out there in the Web, in books,
in dictionaries and encyclopedias, etc.

Mark, how do you plan to solve [or work around] this problem?

A possibly viable approach is to simply parse the Web, and throw out
sentences that one's system can't interpret confidently enough.  The
hope is that there is enough information online in suitably simple
form, to fill up an artificial mind.  This is essentially PowerSet's
plan, as I understand it: to parse the Web and map it into logic, but
using a parser that can only handle relatively simple syntactic
constructs reliably.  The redundancy of the Web is being relied upon.

This may well work to get a whole bunch of knowledge into the system,
but there is a lot of stuff it will never get, too ... because not
everything is out there in simple-sentence or simple-clause form....

Note: I don't buy that acquiring a mass of knowledge in logic form is
the key to AI.  But I do think it's useful, and if PowerSet or Mark
Waser or anyone else builds up an awesome repository of knowledge via
savvy NLP, I'd be very happy to partner with them and ingest that
knowledge into Novamente.  I have thought about ingesting Cyc into
Novamente too, but have not because

-- My impression is that Cyc's commercial license terms mean that if
we sold a Novamente system that used knowledge derived from Cyc
internally, we would be considered as reselling Cyc, which is
expensive.  (If this is wrong, I'd like to know about it...)

-- Cyc seems to me to be over-complexly structured (or, to be more
accurate, to be structured complexly in the wrong sort of way), so
that properly making use of Cyc's KB within Novamente would require
substantial effort in hand-coding "intermediary knowledge" mapping
between Cyc's constructs and the natural logical representation of
knowlege coming out of NM's sensors and actuators.  This is not a huge
objection but it's a reason why we haven't tried to deal with Cyc yet.
I estimate it would take around a man-year of effort to effectively
glue Cyc's KB into the Novamente system...

-- Ben G

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&;



-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Reply via email to