Re: [agi] rule-based NL system

Mark Waser Sat, 28 Apr 2007 07:30:25 -0700

:-)  In bold, blue below

BTW, I am color-blind (the standard male red-green version) so non-bold red is 
a bad choice for replying to me (as in, I missed a couple of your replies 
initially and still may have missed some . . . .  :-)
  ----- Original Message ----- 
  From: YKY (Yan King Yin) 
  To: [email protected] 
  Sent: Saturday, April 28, 2007 2:58 AM
  Subject: Re: [agi] rule-based NL system

  I'll answer you point by point; others who find this tedious can just scroll 
down to "conclusions" at the bottom.

  On 4/27/07, Mark Waser <[EMAIL PROTECTED]> wrote: 
  > I am NOT suggesting a rule-based system at this level.  First I figure out 
a good representation for the minimal Basic English grammar that fundamentally 
has the simplest grammatical rules embedded into it's structure rather than 
expressed as rules ( i.e. I *AM* hand-crafting the design of the initial 
structure -- nouns, verbs, adjectives, adverbs, prepositions, noun clauses, 
verb clauses, prepositional phrases, simple SV sentences, simple SVO sentences, 
etc.  I am also setting up different types of inheritance and analogy links 
between words/terms/structures).  But I am definitely not going to be 
hand-coding grammar "rules" per se.  After the fact, you obviously could say 
that certain rules define the structure (and you could obviously design an 
analogous rule-based system -- with *a lot* more effort) but, at the lowest 
level, my version of the system is not going to be run as a rule-based system. 

  So you are indeed trying to hand-code certain aspects of English grammar into 
your KR structure.  But clearly you cannot embed *full* English grammar into 
your KR.  

  Can't I?  I think that you're *seriously* wrong there.  English grammar is 
simply not impossibly large and I'm a good portion of the way along coding the 
entirety of Huddleston's "English Grammar: An Outline" (i.e. so simply replying 
"Oh yes it *IS* impossibly large" simply won't cut it).

  So what about the *rest* of the grammar knowledge?  

  There isn't any.  What you might be mistaking for the "*rest* of the grammar 
knowledge" is things like partial sentences where certain words are assumed, 
conjunctions where semantics are used instead of grammar to determine meaning, 
etc. -- but the method for decoding these phrases is NOT grammar.

  You must learn it via machine learning.  Then how do you call this 
"machine-learned stuff"?  I call it "grammar rules", but you may call it 
"declarative linguistic knowledge" or whatever.  It's the same. 

  Yes, I am doing machine learning . . . . but it is of *knowledge* not grammar.

  Can you explain more about "inheritance and analogy links"?  It seems that 
you're trying to represent grammar structures using these "links".  I seems 
that they are just computationally equivalent to production rules.

  Inheritance links are just what you'd expect.  A raven isa bird.  Analogy 
links are initial explanatory links that are treated similarly to inheritance 
links but there is the expectation that someday they will be removed.  A bat 
islikea bird.  Again, this is knowledge, not grammar.  And yes, *these* are 
computationally equivalent to production rules.

  > You're also missing the fact that language is both grammar and vocabulary.  
The system needs to be able to "understand" from the beginning (Understand, in 
this case, meaning being able to translate anything to "seed-only" form -- and 
thus, to be able, via the built in structure, to know how to transform between 
alternative forms and to know what aspects and data attach where). 

  I agree and I am well aware of this.

  > 3. You then need to add more complex grammar rules to handle "real" 
English, but such rules are difficult to hand-craft and thus may probably 
require machine learning. 
  >   
  > At this point, the tools I mentioned come into play to extend the structure 
until it can handle "real" English.  This extension is clearly in the realm of 
machine learning but, I believe, is structured and limited enough to be 
feasible -- particularly if I start by pointing it at 
"well-behaved/well-defined" sources like dictionaries, encyclopedias, etc. 

  Yes it is 100% feasible, but it requires complex machine learning.

  I do not concur.  Dictionaries are pretty simple knowledge sources.  
Encyclopedias are not much more complex particularly if I feel free to throw 
out anything that I can't immediately parse.  Other documents are *MUCH* more 
complicated because they require me to start determining context (starting with 
whether they are fiction or not).  I believe that machine learning from 
dictionaries and encyclopedias is not so complex that it isn't tractable now.

    All I'm saying is that I prefer a route where we collect commonsense 
knowledge with Basic English *before* we go into the realm of full English. 

  You take your route and I'll take mine.  I'm curious though (and this is a 
*major* first question) -- Are you going to allow people to define new terms?

  > 4. Only at this stage you can digest the web or newspapers.
  >  
  > Actually, it can probably start *attempting* to digest such sources fairly 
early.  All it really has to do is to be able to tell when it's pretty sure 
that it's correct and when it needs to try again after it's learned more (and 
discard the data until then).  In particular though, it can always go to a 
dictionary when it runs across a new word (or when it seems that a known word 
has another, unknown definition) and it can also go to a trusted human if it's 
really not sure about how to parse a sentence (at which point it gives that 
human a list of alternatives to choose from -- which doesn't require extensive 
training to handle). 

  The vocab is only part of the problem.  The problem that I'm pointing out is 
how do you handle full English grammar.  If your system don't have 
*comprehensive* grammatical knowledge, it will have problems *interpreting* 
complex or "irregular" English sentences, to the point that the "facts" you 
acquire would be incorrect. 

  As I said, it will have comprehensive grammatical knowledge, so . . . . 

  > >> I guess (3) and (4) won't happen immediately.  And after (2) we can 
start collecting commonsense facts via Basic English.  So it seems to me that a 
viable "first product" could be a commonsense engine using Basic English, 
without going to 3 & 4. 
  >  
  > I disagree.  Building the extension/learning tools is a fundamental part of 
the initial design.  If you start collecting commonsense facts without a good 
data structure and the "understanding" described above, you end up with Cyc 
(the same facts encoded multiple ways, almost all facts inaccessible unless you 
access them in almost *exactly* the form in which they were encoded, etc., 
etc.).  I don't find *any* value in that at all.  Facts are only useful if you 
can access and use them. 

  I see your point, but you misunderstand my approach.  My KR scheme is 
minimalistic.  Grammar is represented as additional rules (whereas you embed 
some linguistic structure into your KR).  I plan to hand-code enough grammar 
for Basic English first, and the system will later use machine learning to 
learn more complex grammar rules, until it knows full English. 

  I don't believe that I'm misunderstanding your approach.  I think that you'e 
misunderstanding my objection(s) . . . .  :-)

  First, I don't believe that representing grammar as rules is particularly 
computationally effective.  Yes, grammar can certainly be expressed as rules 
but unless I misunderstood and your hand-coded grammar is *NOT* in rule form, I 
think that you'll find yourself dead in the water before you're even really 
started -- without even successfully re-inventing the wheel since *a lot* of 
work has gone into parsers.

  Second, I don't believe that it is possible to learn complex grammar rules 
(via machine learning or any other method) unless you have a certain *rather 
large* amount of knowledge.

  You assume that my system don't have true "understanding" because it cannot 
inter-relate the same idea expressed in different sentences (ie, paraphrasing). 
 That's a false accusation because my system can do this by logical reasoning. 

  I apparently wasn't clear.  By paraphrasing, I didn't mean re-arranging the 
sentence so that the grammar was different.  I meant using different words with 
the same meaning.  Whenever my system encounters a new word, it is going to 
ensure that it understands that word (by being able to translate it down to 
Basic English) or else it won't accept/use that word.  Is your system going to 
have the same requirement?  If so, I will withdraw my statement but if not I'll 
ask . . . . "What do you mean by understanding" since I will certainly argue 
that mere grammatical e-arrangement is NOT understanding.

  Conclusion:

  1.  The only difference between your approach and mine is that you hand-code 
*some* linguistic knowledge directly into your KR while I leave that to grammar 
rules.

  Not at all.  First, *ALL* grammatical knowledge is "hand-coded" (which is a 
much smaller task than you might guess if you do it right).  Second, I have a 
number of additional requirements (and functionalities) that I haven't seen you 
have in your system and which I believe are going to be impossible to implement 
with a "minimalistic" KR scheme and lots of rules.

  2.  You have not answered the question how you can easily digest 
web/newspaper without adult-level grammar knowledge, which is not easy to 
acquire/learn.

  I've answered it repeatedly.  Adult-level grammar knowledge is possible to 
code.  The problem with parsing adult content is not grammatical, it is 
semantic (i.e. ruling out numerous interpretations that *are* legal 
grammatically but impossible in the real world).

  3.  The issue really is which route is easier:
     (A) build the system to the point of understanding Basic English, then 
start collecting commensense facts.
     (B) build the system to understand real English, and crawl the 
web/newspapers.

  Don't forget that (A) can eventually grow into a full AGI;  the issue here is 
the learning pathway.  (B) is harder because it requires adult-level grammar 
AND a lot of commonsense knowledge.

  Wow!  That was actually *REALLY* bad.  You entirely mis-represented my 
approach. 

  Your method, as I understand it, it to build the system to the point of 
understanding Basic English (the grammar for which is not that far from adult 
grammar) and then use that as a user interface to allow human entry of 
commonsense facts.  I concur that *eventually* that METHOD can grow into a full 
AGI; however, it is also my understanding that your KR scheme is "minimalistic" 
(your term) and therefore, I believe, prone to massive inefficiencies (either 
in terms of computation time, storage space, or both) to such an extent as to 
make your system (not your method) unworkable.

  My method, is to build the system to the point of understanding Basic English 
+ adult grammar by loading it all into my KR and building tools to compactly 
load new knowledge (including new terms) into the KR, to actively harvest new 
terms from one or more dictionaries and load it into the KR, and to then 
harvest knowledge from encyclopedias and load it into the KR.  Note that this 
method is *NOT* particularly harder because 
    1.. adult level grammar is not that much more difficult than what (A) 
requires and 
    2.. contrary to what you state, (B) does NOT require *ANY* commonsense 
knowledge to be hand-coded.
    If you do not follow an INCREMENTAL LEARNING PATHWAY then your learning 
speed will be unfeasibly slow - that's my theory. 

  I am following an INCREMENTAL LEARNING PATHWAY.  Why do you think that I'm 
not?

  :-) Looking forward to your responses.

          Mark

  YKY

------------------------------------------------------------------------------
  This list is sponsored by AGIRI: http://www.agiri.org/email
  To unsubscribe or change your options, please go to:
  http://v2.listbox.com/member/?&;

-----
This list is sponsored by AGIRI: http://www.agiri.org/email
To unsubscribe or change your options, please go to:
http://v2.listbox.com/member/?member_id=231415&user_secret=fabd7936

Re: [agi] rule-based NL system

Reply via email to