Re: [agi] 40 years of parsing NL...

Steve Richfield Sat, 23 Mar 2013 14:44:21 -0700

PM,

On Sat, Mar 23, 2013 at 10:53 AM, Piaget Modeler
<[email protected]>wrote:


> Steve,
>
> LA parsing gives you both the syntactic parse and the semantic parse
> concurrently.
>

I read some more from Amazon's Look Inside, and I think we mean different
things by "semantics". Most (NON-translation) applications (like
DrEliza.com) view semantic content as things that are actionable - i.e.
that affects output into the real world. A computer is capable only of the
actions that its programs and databases facilitate. Beyond that, it is
possible to observe, but not "understand".

In DrEliza.com, for a symptom to be actionable, it must usually be in the
present tense, though some symptoms can be in the past tense. It is
important to know whether it is a positive statement (they state that they
have the symptom), or a negative statement (they state that they do not
have the symptom). Also considered are inclusions that depreciate
statements, like the presence of "I think", or "a little".

Applications other than language translation usually look for a similar
sort of shopping list of actionable statements, observations, complaints,
etc., that match a set of code to act appropriately on them.

Language translation programs are a special case that I can see LA working
well on, once it gets beyond its character-twiddling, especially
translating scientific material that tends to have better grammar than
other writings.

However, from what I read, proplets seem to be associated with isolated
words, rather than a syntactically variable structure of words having a
specific actionable meaning (in addition to other meanings that the program
is not able to act on) that a program is able to act upon.

For example, DrEliza.com would see the statement "My eyes run because I
have hay fever." as two separate statements "My eyes run" and "I have hay
fever" because it summarily rejects supplied explanations, because if they
know what was wrong, they wouldn't be asking DrEliza. Runny eyes are not
(presently) actionable, but past or present-tense hay fever is an important
indication of a propensity for autoimmune reactions that are associated
with some MUCH more serious conditions, like emphysema and Parkinson's.

My present parser gives me a list of subjects that were mentioned, their
tense, and  negation. A future system would also have to provide the person
being referenced, detect that they are talking about some sort of personal
problem, include depreciation information (that now elicits a reply to
remove it and make a clear statement), etc.

This information now feeds an engine to compute the probabilities of
various underlying conditions that, if corrected, could resolve their
supposedly "incurable" condition. My patent application includes the
capability of performing this last step as part of the parsing process, as
it is really just another type of rule to apply, albeit more complicated
than the other types of rules.

After all, in a mathematical sense, a "solution" is really just a
restatement of the problem, e.g. moving X to the left of the = sign and
everything else to the right of the = sign. Hence, problem solving (like
figuring out what is REALLY wrong with someone, rather than simply listing
their symptoms) is really just the tail end of the semantic analysis.

What could I get from a proplet to guide this process?

*Again*, what real-world applications has LA been successfully used
for,and/or what real world applications do people anticipate that it would
be good for? Various other methods look better, outside of language
translation.

>
> I don't know where you got the idea that you don't get the semantics as
> well.
>

From the material you included in a previous posting.

Also, I ***still*** don't see how LA could ever deal with most idioms. Can
you give me a hint?


> The semantics comes as frames in the form of "proplets", i.e., semantic
> frame attributes...
>

As explained above, I think we have different ideas about the meaning of
"semantics". LA would have to parse the ENTIRE language to be able to make
sense of any of it. For example EVERY noun would have to be coded as to
whether it could have a color, and not just the nouns that are interesting
for the specific application. I would think that this would complicate the
coding by a hundredfold or so - reaching to the astronomical numbers that
Matt was suggesting that it would take. Hmmm, aha, maybe this is what Matt
was thinking when he made his statements...

Most real-world applications (NOT including language translation) are only
concerned with some of the ~1,000 most commonly used words, plus a few
oddball words from a MUCH larger vocabulary - like hundreds of thousands of
words. DrEliza now only recognizes ~100 words, and a projected expansion of
a medical version would still probably be ~1,000 words. Producing a widely
useful LA system would require coding ontological information for some
fraction of a million words, which would be a thousand times the effort
(using above numbers) than coding for a specific application. Do we REALLY
need to know that nuns can have color? For example, a bottom-up approach
can recognize the presence of an interesting noun that could have
associated color information, so a bottom-up parse would simply look to the
left of such a noun to see if a color has been indicated. This would answer
the questions of color for the interesting nouns, and discard color
information for everything else that is also being discarded, without
having to do all that coding. Right?

Comments?

BTW, how would LA parse the above one-word question?

Steve

>
> ------------------------------
> Date: Sat, 23 Mar 2013 08:57:34 -0700
>
> Subject: Re: [agi] 40 years of parsing NL...
> From: [email protected]
> To: [email protected]
>
> PM (and Logan),
>
> You said in a previous posting that you have experience with L-A. What
> have you (or others) done with it?
>
> I ask because once you sidestep semantic units, it seems to  me like you
> have thrown the baby out with the bathwater, at least for the usual
> applications needing some degree of "understanding". Maybe I just haven't
> noticed a good application that doesn't need semantic units, or I haven't
> understood a good way to live without them. Sure you can "parse" while
> ignoring them, but then of what use is the resulting parse?!!!
>
> Idioms (of which there are thousands) are a sort of ill-behaved semantic
> unit. How do you handle idioms while sidestepping semantic units?
>
> Logan: Have you been following this discussion? RADP is close enough to
> what I am planning to have the same semantic unit needs. Can you help make
> sense of this?
>
> What (if anything) am I missing here?
>
> Steve
> =================
> On Fri, Mar 22, 2013 at 7:08 PM, Steve Richfield <
> [email protected]> wrote:
>
> PM,
>
> On Fri, Mar 22, 2013 at 5:27 PM, Piaget Modeler <[email protected]
> > wrote:
>
>
> Actually, it's more than making a chatbot.  It's having a real robot
> respond to a person based on linking utterances
> (made by either the robot or the person) to the current context (milieu
> entities and events).
>
> I think before you make your Worldcomp presentation it would behoove you
> to read the *NEWCAT *and
> *Computation of Language* books so that you can adequately articulate the
> differences in your approach.
>
>
> We seem to be talking past each other here. My presentation at Worldcomp
> need not compare with anything, most especially character-based methods
> that don't seem to even recognize what parsing applications need from a
> parser, let alone squarely addressing the how to provide what those
> applications need. There is SO much that these methods don't on first
> glance address.
>
> Each parsing method seems to need a champion, and you seem to be the
> resident champion for L-A grammar here. I know you want to just send me
> some hyperlinks and tell me to go away and read some books, but here on
> this forum we each learn our own particular areas, and defend against
> stones tossed by people defending nearby areas. I tossed a stone your way
> when I claimed blinding speed. You tossed a stone back when you explained
> that all that was needed to parse was to move about though L-A map of
> English grammar. I tossed the stone back, pointing out that losing the
> semantic elements (many of which are idioms that don't make much
> grammatical sense) throws the baby out with the bath water, because
> applications (other than machine translation) are only interested in
> semantics, not syntax. Dragging semantics out of a parse tree is a really
> BIG job, requiring the SAME tests as other parsing methods. Sure you
> produce a parse in a hurry by not doing the job of other parsers, but then
> doing that job loses the speed advantage.
>
> To illustrate some of the challenges, I took a large idiom dictionary and
> tried looking up idioms that I commonly use in everyday speech, and only
> found about half of them. So much for quality control. How does L-A deal
> with idioms? Once you have discarded the low-level semantic elements as
> part of putting words into parse trees, recognizing idioms could become
> quite difficult. Further. many idioms are ungrammatical. Are you planning
> to include idioms as part of the map of the language?!!!
>
> Anyway, I **DO** want to understand L-A enough to see if it is
> significant, or have you understand my method enough to be able to compare
> the two, so we can both see the relationships between these two VERY
> different things.
>
> Steve
>
>
> ------------------------------
> Date: Fri, 22 Mar 2013 15:30:59 -0700
> Subject: Re: [agi] 40 years of parsing NL...
>
> From: [email protected]
> To: [email protected]
>
> PM,
>
> This guy is talking about a different approach for making a chatbot -
> right? If so, he doesn't show any indication of knowing about present
> chatbots. Present technology is to have a variety of sentence skeletons,
> into which appropriate words and phrases are placed, which seems to work
> quite well.
>
> I would think that promoting a technology would best be done with FREE
> documents and other supporting material. I already have the 10,000 most
> commonly used words in a file in order of frequency of use, if you or
> anyone else wants a copy.
>
> I believe that my approach will be fast enough to keep up with the
> Internet, and I haven't seen any other approach that promises such blinding
> speed. In theory, all I need do is get the word out, and wait for folks at
> Google, Yahoo, and Facebook to discover it, which is my present plan.
>
> I also plan to present this at the next WORLDCOMP conference.
>
> BTW, ***THANKS*** for holding my feet to the fire!!!  I plan to adapt
> these discussions into the paper I present at WORLDCOMP.
>
> Steve
> ===================
> On Fri, Mar 22, 2013 at 1:39 PM, Piaget Modeler <[email protected]
> > wrote:
>
> Roland's next step:
>
>
> http://www.amazon.com/Computational-Linguistics-Talking-Robots-Processing/dp/3642224318/ref=sr_1_1?ie=UTF8&qid=1363984424&sr=8-1&keywords=talking+robots+roland+hausser
>
> Computational Linguistics and Talking Robots: Processing Content in
> Database Semantics
>
> Publication Date: September 14, 2011 | ISBN-10: 3642224318 | ISBN-13:
>  978-3642224317 | Edition: 2011
> The practical task of building a talking robot requires a theory of how
> natural language communication works. Conversely, the best way to
> computationally verify a theory of natural language communication is to
> demonstrate its functioning concretely in the form of a talking robot, the
> epitome of human–machine communication. To build an actual robot requires
> hardware that provides appropriate recognition and action interfaces, and
> because such hardware is hard to develop the approach in this book is
> theoretical: the author presents an artificial cognitive agent with
> language as a software system called database semantics (DBS). Because a
> theoretical approach does not have to deal with the technical difficulties
> of hardware engineering there is no reason to simplify the system – instead
> the software components of DBS aim at completeness of function and of data
> coverage in word form recognition, syntactic–semantic interpretation and
> inferencing, leaving the procedural implementation of elementary concepts
> for later. In this book the author first examines the universals of natural
> language and explains the Database Semantics approach. Then in Part I he
> examines the following natural language communication issues: using
> external surfaces; the cycle of natural language communication; memory
> structure; autonomous control; and learning. In Part II he analyzes the
> coding of content according to the aspects: semantic relations of
> structure; simultaneous amalgamation of content; graph-theoretical
> considerations; computing perspective in dialogue; and computing
> perspective in text. The book ends with a concluding chapter, a
> bibliography and an index. The book will be of value to researchers,
> graduate students and engineers in the areas of artificial intelligence and
> robotics, in particular those who deal with natural language processing.
>
>
> For you, Steve, the next step is to write a book about your approach and
> sell it for $100 a pop, or $75 for the e-book,
> and do a book tour (if possible).
>
> Then gain some early adopters and market traction.
>
> The point is to make money WHILE promoting your idea.
>
> Cheers,
>
> ~PM
>
> ------------------------------
> Date: Fri, 22 Mar 2013 12:13:23 -0700
> Subject: [agi] 40 years of parsing NL...
> From: [email protected]
> To: [email protected]
>
>
> Piaget, Logan, et al,
>
> We have had some interesting discussions about which method is best and
> fastest, but is it even possible?!!!
>
> My own big wake-up call came many years ago, when I recorded a class I
> presented, and had it transcribed with instructions "don't edit it, just
> transcribe what I said". It was FULL of fragments, missing words, and even
> misstatements, but the class had NO problem grokking what I had said.
>
> Similarly, just take any unedited posting (you can easily recognize
> editing by the lack of ANY spelling errors) and try hand-diagramming its
> sentences. They will be better than spoken sentences, but still, you will
> have problems with around half of them.
>
> Several early NL projects set out with dictionaries that identified every
> part of speech that each word could be, and programmatically set about
> identifying a set of assumptions wherein each sentence would hang together.
> Unfortunately, few sentences had exactly one solution, and the presence of
> any presumed words fractured the entire process.
>
> More recently, "ontological" approaches have attempted to sub-divide the
> parts of speech, e.g. identifying whether a particular noun can have color,
> weight, etc., to assist in assigning the targets of adjectives and adverbs.
>
> The present consensus seems to be that speech is made to a particular
> audience with a particular set of presumed knowledge to use to fill in the
> gaps, and an automated listener/reader will NOT be able to understand
> "plain English" without similar real-world experience as an intended
> reader. Without that experience, lots of gaps and disambiguation errors
> will persist regardless of how much programming effort is expended.
>
> Language translation can skirt many/most of these issues, by preserving
> the semantic ambiguities in the translation, to let the reader/listener
> figure out what the computer failed to figure out.
>
> No, there will never ever be "full understanding", if for no other reason
> than some of what I say simply doesn't make sense. Instead, what can be
> done, and what is needed for present applications, are various forms of
> partial understanding. You can see this in throwing some numerical problems
> at WolframAlpha.com and watching the parsing of it. It picks out key words
> and tries ways of relating them together. Similarly, DrEliza.com picks out
> key words and phrases that are associated with symptoms and conditions it
> knows about.
>
> The MOST important part of "understanding" is often identifying what the
> writer does NOT know (and the computer does know), sort of a reverse
> analysis. I refer to these as "statements of ignorance" and this is an
> important part of DrEliza.com
>
> My parsing proposal was made as a component in a larger system in support
> of problem solving and sales (it is just one box among many in figure 1 in
> my patent application). My approach appears to be general purpose and
> applicable to other applications. Given that a universal parser appears to
> be impossible until it can walk among us, and even then will have some
> problems, each application must consider what it needs to obtain from the
> text/speech to do its job.
>
> So, when relating performance of parsers, it is important to disambiguate
> just WHAT is being performed, e.g. just WHAT is "parsing", and what
> applications will a particular approach work best for?
>
> Logan, what do you see are the "best fit" applications for reverse ascent
> descent parsing?
>
> Piaget, what do you see are the "best fit" applications for LA parsing?
>
> Any thoughts?
>
> Steve
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/19999924-5cfde295> |
> Modify <https://www.listbox.com/member/?&;> Your Subscription
> <http://www.listbox.com>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/10443978-6f4c28ac> |
> Modify <https://www.listbox.com/member/?&;> Your Subscription
> <http://www.listbox.com>
>
>
>
>
> --
> Full employment can be had with the stoke of a pen. Simply institute a six
> hour workday. That will easily create enough new jobs to bring back full
> employment.
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/19999924-5cfde295> |
> Modify <https://www.listbox.com/member/?&;> Your Subscription
> <http://www.listbox.com>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/10443978-6f4c28ac> |
> Modify <https://www.listbox.com/member/?&;> Your Subscription
> <http://www.listbox.com>
>
>
>
>
> --
> Full employment can be had with the stoke of a pen. Simply institute a six
> hour workday. That will easily create enough new jobs to bring back full
> employment.
>
>
>
>
> --
> Full employment can be had with the stoke of a pen. Simply institute a six
> hour workday. That will easily create enough new jobs to bring back full
> employment.
>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/19999924-5cfde295> |
> Modify <https://www.listbox.com/member/?&;> Your Subscription
> <http://www.listbox.com>
>    *AGI* | Archives <https://www.listbox.com/member/archive/303/=now>
> <https://www.listbox.com/member/archive/rss/303/10443978-6f4c28ac> |
> Modify<https://www.listbox.com/member/?&;>Your Subscription
> <http://www.listbox.com>
>



-- 
Full employment can be had with the stoke of a pen. Simply institute a six
hour workday. That will easily create enough new jobs to bring back full
employment.



-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] 40 years of parsing NL...

Reply via email to