:-) In bold, blue below
BTW, I am color-blind (the standard male red-green version) so non-bold red is
a bad choice for replying to me (as in, I missed a couple of your replies
initially and still may have missed some . . . . :-)
----- Original Message -----
From: YKY (Yan King Yin)
To: [email protected]
Sent: Saturday, April 28, 2007 2:58 AM
Subject: Re: [agi] rule-based NL system
I'll answer you point by point; others who find this tedious can just scroll
down to "conclusions" at the bottom.
On 4/27/07, Mark Waser <[EMAIL PROTECTED]> wrote:
> I am NOT suggesting a rule-based system at this level. First I figure out
a good representation for the minimal Basic English grammar that fundamentally
has the simplest grammatical rules embedded into it's structure rather than
expressed as rules ( i.e. I *AM* hand-crafting the design of the initial
structure -- nouns, verbs, adjectives, adverbs, prepositions, noun clauses,
verb clauses, prepositional phrases, simple SV sentences, simple SVO sentences,
etc. I am also setting up different types of inheritance and analogy links
between words/terms/structures). But I am definitely not going to be
hand-coding grammar "rules" per se. After the fact, you obviously could say
that certain rules define the structure (and you could obviously design an
analogous rule-based system -- with *a lot* more effort) but, at the lowest
level, my version of the system is not going to be run as a rule-based system.
So you are indeed trying to hand-code certain aspects of English grammar into
your KR structure. But clearly you cannot embed *full* English grammar into
your KR.
Can't I? I think that you're *seriously* wrong there. English grammar is
simply not impossibly large and I'm a good portion of the way along coding the
entirety of Huddleston's "English Grammar: An Outline" (i.e. so simply replying
"Oh yes it *IS* impossibly large" simply won't cut it).
So what about the *rest* of the grammar knowledge?
There isn't any. What you might be mistaking for the "*rest* of the grammar
knowledge" is things like partial sentences where certain words are assumed,
conjunctions where semantics are used instead of grammar to determine meaning,
etc. -- but the method for decoding these phrases is NOT grammar.
You must learn it via machine learning. Then how do you call this
"machine-learned stuff"? I call it "grammar rules", but you may call it
"declarative linguistic knowledge" or whatever. It's the same.
Yes, I am doing machine learning . . . . but it is of *knowledge* not grammar.
Can you explain more about "inheritance and analogy links"? It seems that
you're trying to represent grammar structures using these "links". I seems
that they are just computationally equivalent to production rules.
Inheritance links are just what you'd expect. A raven isa bird. Analogy
links are initial explanatory links that are treated similarly to inheritance
links but there is the expectation that someday they will be removed. A bat
islikea bird. Again, this is knowledge, not grammar. And yes, *these* are
computationally equivalent to production rules.
> You're also missing the fact that language is both grammar and vocabulary.
The system needs to be able to "understand" from the beginning (Understand, in
this case, meaning being able to translate anything to "seed-only" form -- and
thus, to be able, via the built in structure, to know how to transform between
alternative forms and to know what aspects and data attach where).
I agree and I am well aware of this.
> 3. You then need to add more complex grammar rules to handle "real"
English, but such rules are difficult to hand-craft and thus may probably
require machine learning.
>
> At this point, the tools I mentioned come into play to extend the structure
until it can handle "real" English. This extension is clearly in the realm of
machine learning but, I believe, is structured and limited enough to be
feasible -- particularly if I start by pointing it at
"well-behaved/well-defined" sources like dictionaries, encyclopedias, etc.
Yes it is 100% feasible, but it requires complex machine learning.
I do not concur. Dictionaries are pretty simple knowledge sources.
Encyclopedias are not much more complex particularly if I feel free to throw
out anything that I can't immediately parse. Other documents are *MUCH* more
complicated because they require me to start determining context (starting with
whether they are fiction or not). I believe that machine learning from
dictionaries and encyclopedias is not so complex that it isn't tractable now.
All I'm saying is that I prefer a route where we collect commonsense
knowledge with Basic English *before* we go into the realm of full English.
You take your route and I'll take mine. I'm curious though (and this is a
*major* first question) -- Are you going to allow people to define new terms?
> 4. Only at this stage you can digest the web or newspapers.
>
> Actually, it can probably start *attempting* to digest such sources fairly
early. All it really has to do is to be able to tell when it's pretty sure
that it's correct and when it needs to try again after it's learned more (and
discard the data until then). In particular though, it can always go to a
dictionary when it runs across a new word (or when it seems that a known word
has another, unknown definition) and it can also go to a trusted human if it's
really not sure about how to parse a sentence (at which point it gives that
human a list of alternatives to choose from -- which doesn't require extensive
training to handle).
The vocab is only part of the problem. The problem that I'm pointing out is
how do you handle full English grammar. If your system don't have
*comprehensive* grammatical knowledge, it will have problems *interpreting*
complex or "irregular" English sentences, to the point that the "facts" you
acquire would be incorrect.
As I said, it will have comprehensive grammatical knowledge, so . . . .
> >> I guess (3) and (4) won't happen immediately. And after (2) we can
start collecting commonsense facts via Basic English. So it seems to me that a
viable "first product" could be a commonsense engine using Basic English,
without going to 3 & 4.
>
> I disagree. Building the extension/learning tools is a fundamental part of
the initial design. If you start collecting commonsense facts without a good
data structure and the "understanding" described above, you end up with Cyc
(the same facts encoded multiple ways, almost all facts inaccessible unless you
access them in almost *exactly* the form in which they were encoded, etc.,
etc.). I don't find *any* value in that at all. Facts are only useful if you
can access and use them.
I see your point, but you misunderstand my approach. My KR scheme is
minimalistic. Grammar is represented as additional rules (whereas you embed
some linguistic structure into your KR). I plan to hand-code enough grammar
for Basic English first, and the system will later use machine learning to
learn more complex grammar rules, until it knows full English.
I don't believe that I'm misunderstanding your approach. I think that you'e
misunderstanding my objection(s) . . . . :-)
First, I don't believe that representing grammar as rules is particularly
computationally effective. Yes, grammar can certainly be expressed as rules
but unless I misunderstood and your hand-coded grammar is *NOT* in rule form, I
think that you'll find yourself dead in the water before you're even really
started -- without even successfully re-inventing the wheel since *a lot* of
work has gone into parsers.
Second, I don't believe that it is possible to learn complex grammar rules
(via machine learning or any other method) unless you have a certain *rather
large* amount of knowledge.
You assume that my system don't have true "understanding" because it cannot
inter-relate the same idea expressed in different sentences (ie, paraphrasing).
That's a false accusation because my system can do this by logical reasoning.
I apparently wasn't clear. By paraphrasing, I didn't mean re-arranging the
sentence so that the grammar was different. I meant using different words with
the same meaning. Whenever my system encounters a new word, it is going to
ensure that it understands that word (by being able to translate it down to
Basic English) or else it won't accept/use that word. Is your system going to
have the same requirement? If so, I will withdraw my statement but if not I'll
ask . . . . "What do you mean by understanding" since I will certainly argue
that mere grammatical e-arrangement is NOT understanding.
Conclusion:
1. The only difference between your approach and mine is that you hand-code
*some* linguistic knowledge directly into your KR while I leave that to grammar
rules.
Not at all. First, *ALL* grammatical knowledge is "hand-coded" (which is a
much smaller task than you might guess if you do it right). Second, I have a
number of additional requirements (and functionalities) that I haven't seen you
have in your system and which I believe are going to be impossible to implement
with a "minimalistic" KR scheme and lots of rules.
2. You have not answered the question how you can easily digest
web/newspaper without adult-level grammar knowledge, which is not easy to
acquire/learn.
I've answered it repeatedly. Adult-level grammar knowledge is possible to
code. The problem with parsing adult content is not grammatical, it is
semantic (i.e. ruling out numerous interpretations that *are* legal
grammatically but impossible in the real world).
3. The issue really is which route is easier:
(A) build the system to the point of understanding Basic English, then
start collecting commensense facts.
(B) build the system to understand real English, and crawl the
web/newspapers.
Don't forget that (A) can eventually grow into a full AGI; the issue here is
the learning pathway. (B) is harder because it requires adult-level grammar
AND a lot of commonsense knowledge.
Wow! That was actually *REALLY* bad. You entirely mis-represented my
approach.
Your method, as I understand it, it to build the system to the point of
understanding Basic English (the grammar for which is not that far from adult
grammar) and then use that as a user interface to allow human entry of
commonsense facts. I concur that *eventually* that METHOD can grow into a full
AGI; however, it is also my understanding that your KR scheme is "minimalistic"
(your term) and therefore, I believe, prone to massive inefficiencies (either
in terms of computation time, storage space, or both) to such an extent as to
make your system (not your method) unworkable.
My method, is to build the system to the point of understanding Basic English
+ adult grammar by loading it all into my KR and building tools to compactly
load new knowledge (including new terms) into the KR, to actively harvest new
terms from one or more dictionaries and load it into the KR, and to then
harvest knowledge from encyclopedias and load it into the KR. Note that this
method is *NOT* particularly harder because
1.. adult level grammar is not that much more difficult than what (A)
requires and
2.. contrary to what you state, (B) does NOT require *ANY* commonsense
knowledge to be hand-coded.
If you do not follow an INCREMENTAL LEARNING PATHWAY then your learning
speed will be unfeasibly slow - that's my theory.
I am following an INCREMENTAL LEARNING PATHWAY. Why do you think that I'm
not?
:-) Looking forward to your responses.
Mark
YKY
------------------------------------------------------------------------------
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?&
-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936