Re: [agi] Steve's placement/payload theory of language

Matt Mahoney Mon, 01 Apr 2013 13:49:49 -0700

On Mon, Apr 1, 2013 at 2:05 PM, Steve Richfield
<[email protected]> wrote:
>
> Matt,
>
> On Mon, Apr 1, 2013 at 10:08 AM, Matt Mahoney <[email protected]> wrote:
>>
>> On Mon, Apr 1, 2013 at 5:35 AM, Steve Richfield
>> <[email protected]> wrote:
>> > Is Summly's algorithm described somewhere?
>>
>> Not really. http://summly.com/technology.html
>>
>> But the general idea is to remove the parts of the text that compress
>> easily, so you are left with low-frequency words and phrases from the
>> beginning of the document or that occur frequently in the document.
>> You can also use supervised learning to train the language model to
>> select the right features. How well all of this works depends on the
>> quality of the language model. Generally it would be an ensemble of
>> n-gram models, with parsing playing little or no role.
>> http://en.wikipedia.org/wiki/Automatic_summarization
>
> This sounds pretty remote from our present parsing discussions. I don't see a 
> connection. Do you?


It has nothing to do with parsing, just like most language modeling
that works. You are aware that children learn semantics before syntax,
right? You are aware that you can't parse a sentence until you
understand it first, right? You understand that natural language
modeling is completely different from what a compiler or interpreter
does, right? You understand the bag of words model, right?

>> > Note a quirk of law: It is conceivable that Summly had adopted my 
>> > algorithm but kept it proprietary. As such, Yahoo would have NO claim on 
>> > the technology, and their work would NOT count as prior art. It happens 
>> > all the time - people validly patent things that it turns out someone else 
>> > has already developed. These patents are fully enforceable.
>>
>> Source code does not have much value unless you hire the developer. I
>> doubt that patents matters in this case, although I'm sure that Yahoo
>> will pursue them.
>
> By "them" I presume you mean their patents. Now, I will soon have one of my 
> own - or find myself in the middle of a battle at the USPTO over my method if 
> Yahoo (or Google or ???) has already filed a patent in the same area - which 
> seems unlikely.

Yes, I believe that Yahoo will file patents to protect the technology
that they bought. My guess is that D'Aloisio was more interested in
creating something useful than worrying about patents. Yahoo will
probably file for patents, if for no other reason than to kill it. But
I doubt there is any conflict between their patents and yours. And
even if there was, Summly was first. Remember you filed just under the
old "first to invent" rules.

>> > However, my invention was NOT what to do, but how to do such things 
>> > faster. The combinatorial explosion from failed tests hangs over the head 
>> > of all NL "understanding" efforts. From what I can see, my method is the 
>> > ONLY presently known way of prospectively running fast enough, once the 
>> > rules/tables/DB are populated with all the information needed to process 
>> > everyday English (or other natural language).
>>
>> Your bold claims
>
> Which claim(s) are you challenging;
> 1.  That failed tests are the major source of overhead in NL understanding?
> 2.  That using my method will make systems run faster than not using my 
> method?
> 3.  That there aren't other known ways of eliminating this overhead?
> 4.  That we should even be talking about a full implementation of English, 
> when no one has yet done it?

The bold claim that yours is the ONLY solution, when in fact you have
not demonstrated that your technology solves any problem, or that
there are no other solutions. All you have is an idea. Big deal.
Everyone has ideas.

In particular, language models already convert words or ordinals, and
they rank matches to rare words with greater significance. These are
the two key ideas in your patent. They might give a vast speedup if
they were not already widely used techniques in language modeling. So
you'll need to think of something else.

>> would be more credible if you had an actual working
>> system.
>
> This is obviously a multi-million-dollar team-effort project.

Then you need to demonstrate a prototype if you hope to attract
investors. Good luck.

>> Populating a knowledge base with rules is not so easy as you
>> think. Ask Doug Lenat.
>
> Doug has gone WAY WAY WAY beyond "simply" defining a language - he is trying 
> to define the whole damn world.

And how do you estimate the number of rules that you will need? What
do you know that Doug Lenat didn't know, and 29 years later still
doesn't? Anyone can guess a number. How do you know?

> Note that my target application does NOT require a full implementation of 
> English, just those pieces that relate to specific problems and products that 
> the application addresses, one by one, as new products are added. Sure this 
> could eventually encompass pretty much all of the language, but just like in 
> your plans, this would pay for itself as it was incrementally developed.

Such systems already exist. They scan for words and phrases in your
email, posts, search queries, and web history to build a model of your
mind, and match the model to ads according to their content and what
the advertisers pay. Those models are far more advanced than what you
described in your patent application because big money has already
gone into developing and testing them.

> eBay showed how it is possible to insinuate a computer into the sales 
> process. Your plan shows how it might be possible to insinuate a computer 
> into the entire economic process. I just supplied some implementation details 
> to point a way to make this happen sooner rather than later, and in the 
> process eliminate the chicken-or-egg problem with money - of needing a 
> quadrillion dollars to make a quadrillion dollars.

My AGI design includes a funding model. People already have an
incentive to share information and provide the time and computing
power to do so. Nobody needs to come up with a quadrillion dollars to
build a quadrillion dollar automated economy. It is already being
built.


--
-- Matt Mahoney, [email protected]


-------------------------------------------
AGI
Archives: https://www.listbox.com/member/archive/303/=now
RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657
Powered by Listbox: http://www.listbox.com

Re: [agi] Steve's placement/payload theory of language

Reply via email to