Re: [agi] 40 years of parsing NL...

Steve Richfield Fri, 22 Mar 2013 18:35:49 -0700

Matt,

Yes, I have been tracking the internals of Dragon NaturallySpeaking, whose
patent has recently expired, so ANYONE can now use their technology.

Ordinals have been around for a LONG time. Google would be dead without
them.

There are two new tricks in my method (which need each other to work), both
of which appear to be entirely new.

1.  Triggering analysis based on the presence of a least frequently used
word in a rule. This is fundamentally MUCH faster than using this
information in other ways, e.g. testing for the LFU information first, as
that is the test most likely to fail. This enormous increase in speed is
because >99% of the rules that would have otherwise been evaluated, are
never evaluated and don't cost a nanosecond of CPU time.

2.  Putting pointers to rules into prioritized queues, to put the mess made
from trying to make sense when only <1% of the rules are evaluated. In
other approaches, evaluation is guided by the construction of rules, which
is in effect ripped out when rules are summarily (English seems to lack the
right word here. "Skipped" is close, but even when you skip something there
is some cost in deciding to skip, and then in the skipping itself. Here,
the cost of the things lacking their LFU element is exactly zero). For
natural language processing the difference between "skipped" and the action
for which there is no word explained above is orders of magnitude in
performance.

Continuing.
On Fri, Mar 22, 2013 at 5:53 PM, Matt Mahoney <[email protected]>wrote:

> On Fri, Mar 22, 2013 at 6:30 PM, Steve Richfield
> <[email protected]> wrote:
> > I believe that my approach will be fast enough to keep up with the
> Internet, and I haven't seen any other approach that promises such blinding
> speed. In theory, all I need do is get the word out, and wait for folks at
> Google, Yahoo, and Facebook to discover it, which is my present plan.
>
> Blinding speed remains to be seen.
>

I think it can be guesstimated with enough accuracy to make go/no-go
decisions.

>
> You are probably aware that most language models already use the
> technique of converting words to ordinals. When speech recognition was
> a hot research topic in the 1990's, a common language modeling
> technique would be to convert the most common 20K words to numbers and
> construct various models to compute word probabilities using e.g.
> trigram and sparse bigram models. These probabilities are combined
> with the distribution from the acoustic model to select the most
> likely next word. The language models would also be evaluated
> independently by measuring perplexity, which is really the same thing
> as compression ratio. Better models compress smaller, and this
> correlates strongly with lower word error rates.
>

Of course these aren't concerned with semantic content, just what words
were said. Their output might be input to the process I have described.

>
> Converting words to ordinals is also used in the number 1, 2, 4, 7,
> and 9 compressors in http://mattmahoney.net/dc/text.html
> The technique is to read a dictionary (a list of words), construct a
> hash table index in memory, then convert the input text to a sequence
> of ordinals (1, 2, or 3 byte numbers depending on word frequency),
> plus symbols for punctuation, letters for spelling rare unmatched
> words, and symbols to denote when the word is capitalized or all upper
> case. The parsers use simple ad-hoc rules similar to the ones
> described in your patent. Then the sequence of ordinals is compressed,
> usually one bit at a time using a mix of hundreds of context models.
> The top ranked program (durilca'kingsize) uses PPM, or 1 byte at a
> time prediction, and wins only because it uses 13 GB of memory. The
> second place program (paq8hp12any) is the winner of the Hutter prize,
> which limits memory to 1 GB.
>
> In all cases, the dictionary is derived from the test data itself.
> (The evaluation includes the decompression program and compressed
> dictionary size, so it is not cheating to do this). For 7'th ranked
> xwrt, this is done dynamically. In the other cases, the dictionary was
> prepared in advance from the test data and organized both to improve
> its compressibility as well as the compressibility of the tokenized
> text. The first method involves grouping words with similar spelling,
> such as by sorting reversed words to group suffixes. The second method
> involves grouping words that are related semantically and
> grammatically, like "mother" with "father" and "monday" with
> "tuesday". For paq8hp12any and lpaq9m (both use the same
> dictionaries), a lot of the organization was done manually with the
> help of some utilities. For durilca'kingsize, the author didn't go
> into details but I believe he used a program that clustered the words
> in context space. Either way, this grouping allows for contexts that
> drop the low order bits of the ordinals, effectively making the group
> itself a context.
>

I don't immediately see how data compression relates to understanding text,
though I DO see that some understanding might help with the compression.

Lots of people have talked about "loss" in compression. I want to see GAIN.
In a gaining compressor, you might put in Wikipedia, and get out Wikipedia
with fewer misspelled words, better grammar, and some semantic errors
corrected. I believe that the competition should demand no NET loss, i.e.
it should gain at least as much as it looses. After all, what is the value
in more cheaply representing someone's typing errors?!!!

>
> Anyway, I would not be surprised if Google, Bing, Facebook, etc. are
> already using similar techniques in their language models.

Models - yes. Methods - I doubt it. In any case, if they haven't filed for
a patent on it, I will still own it, because "first to invent" is gone
EXCEPT for applications like mine that were filed before March 16, and is
now replaced by "first to file", and I have already filed.

Hence, it isn't at all inconceivable that I could now own what they are now
working on!!!

I wonder if I should send them a letter inquiring whether one of us is, or
will be, infringing on the other?

> You might
> actually want to build something before making bold claims.

Why?

Most astute IP programs patent, build, and then patent again. I have simply
taken the first step and am gearing up for the second step, the first task
of which is to find partners, raise money, etc. THAT requires making the
bold claims needed to get people interested enough to participate. Are you
(or anyone else reading this posting) interested in being a part of this?

> The part
> you glossed over - building a set of rules describing the language -
> might not be as easy as you think.

I think I posted that I was expecting it to take a linguist-decade.

A couple of benchmarks. Watson was
> a 30 person-year effort, and most of the language rules were learned
> from 4 TB of text rather than hand coded.

Yea, but their linguistic goals were MUCH more ambitious than mine. In the
patented application, there are specific things to accomplish that aren't
nearly as open-ended. It needs to be smarter than DrEliza, but not all THAT
much smarter to be able to select and tailor ads.

Also note that Watson didn't need to understand what was happening, just
run like a mouse in a maze though the information.

> It runs on a few thousand CPU cores.

Of course, since their selected challenge is pretty close to the traveling
salesman problem, only each "trip" is a link in a gigantic database.

> Cyc has invested hundreds of person-years since 1984 and
> they have absolutely no idea how much more work they need to do.
>

Cyc will never ever do anything useful.

Steve

-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] 40 years of parsing NL...

Reply via email to