Re: [agi] 40 years of parsing NL...

Matt Mahoney Sat, 23 Mar 2013 18:04:55 -0700

On Fri, Mar 22, 2013 at 9:35 PM, Steve Richfield
<[email protected]> wrote:
> There are two new tricks in my method (which need each other to work), both 
> of which appear to be entirely new.
>
> 1.  Triggering analysis based on the presence of a least frequently used word 
> in a rule.
...
> 2.  Putting pointers to rules into prioritized queues, to put the mess made 
> from trying to make sense when only <1% of the rules are evaluated.
...


I agree you can probably patent these. That doesn't mean it hasn't
been done before and hasn't already been patented. I'm just saying it
takes a great deal of effort to check, so the USPTO will probably take
the easy way out and issue the patent.

>> Blinding speed remains to be seen.
>
> I think it can be guesstimated with enough accuracy to make go/no-go 
> decisions.

In theory, Cyc should be fast because it was designed to run on 1980's
computers. The issue is how many rules do you need?

> I don't immediately see how data compression relates to understanding text, 
> though I DO see that some understanding might help with the compression.

A simple test to see if you understand some text is to see if you can
guess missing words. Compression measures the same thing.

> Lots of people have talked about "loss" in compression. I want to see GAIN. 
> In a gaining compressor, you might put in Wikipedia, and get out Wikipedia 
> with fewer misspelled words, better grammar, and some semantic errors 
> corrected. I believe that the competition should demand no NET loss, i.e. it 
> should gain at least as much as it looses. After all, what is the value in 
> more cheaply representing someone's typing errors?!!!

Errors compress poorly because they represent unexpected symbols in
the text. So yes, you could probably correct text by substituting
symbols that compress more easily, if the model is good.

>> Anyway, I would not be surprised if Google, Bing, Facebook, etc. are
>> already using similar techniques in their language models.
>
> Models - yes. Methods - I doubt it. In any case, if they haven't filed for a 
> patent on it, I will still own it, because "first to invent" is gone EXCEPT 
> for applications like mine that were filed before March 16, and is now 
> replaced by "first to file", and I have already filed.
>
> Hence, it isn't at all inconceivable that I could now own what they are now 
> working on!!!
>
> I wonder if I should send them a letter inquiring whether one of us is, or 
> will be, infringing on the other?

Do you really think that these companies don't already own hundreds or
thousands of patents on the methods they invented and are using? I
suppose if your price is low, they will pay it to avoid the hassle. If
you ask for too much, they will point out prior art to invalidate your
patent.

>> You might
>> actually want to build something before making bold claims.
>
>
> Why?
>
> Most astute IP programs patent, build, and then patent again. I have simply 
> taken the first step and am gearing up for the second step, the first task of 
> which is to find partners, raise money, etc. THAT requires making the bold 
> claims needed to get people interested enough to participate. Are you (or 
> anyone else reading this posting) interested in being a part of this?

No, because I don't believe your bold claims.

If you want to make money, then patent something trivial and don't
tell anyone and wait for a big company to step on it.
http://en.wikipedia.org/wiki/JPEG#Patent_issues
The patent claim in this case was the "invention" of using a single
code to represent a run of zeros followed by a non-zero value. Who
would have thought it would be worth $105 million? The claim wasn't
even valid, due to prior art.

> I think I posted that I was expecting it to take a linguist-decade.

You might want to pay attention to what Kurzweil is doing at Google.
He has put together a team of several top researchers to tackle
exactly the natural language problem. He has access to a model with
300 million concepts, and an awful lot of computing power.

> Also note that Watson didn't need to understand what was happening, just run 
> like a mouse in a maze though the information.
>
>> It runs on a few thousand CPU cores.
>
> Of course, since their selected challenge is pretty close to the traveling 
> salesman problem, only each "trip" is a link in a gigantic database.

No, that's not how it works. They run hundreds of models in parallel
and combine the answers.
http://www.aaai.org/Magazine/Watson/watson.php

> Cyc will never ever do anything useful.

I agree. So why are you proposing a rule based system too?

--
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] 40 years of parsing NL...

Reply via email to