Matt, On Mon, Apr 1, 2013 at 1:49 PM, Matt Mahoney <[email protected]>wrote:
> On Mon, Apr 1, 2013 at 2:05 PM, Steve Richfield > <[email protected]> wrote: > > > > Matt, > > > > On Mon, Apr 1, 2013 at 10:08 AM, Matt Mahoney <[email protected]> > wrote: > >> > >> On Mon, Apr 1, 2013 at 5:35 AM, Steve Richfield > >> <[email protected]> wrote: > >> > Is Summly's algorithm described somewhere? > >> > >> Not really. http://summly.com/technology.html > >> > >> But the general idea is to remove the parts of the text that compress > >> easily, so you are left with low-frequency words and phrases from the > >> beginning of the document or that occur frequently in the document. > >> You can also use supervised learning to train the language model to > >> select the right features. How well all of this works depends on the > >> quality of the language model. Generally it would be an ensemble of > >> n-gram models, with parsing playing little or no role. > >> http://en.wikipedia.org/wiki/Automatic_summarization > > > > This sounds pretty remote from our present parsing discussions. I don't > see a connection. Do you? > > It has nothing to do with parsing, just like most language modeling > that works. You are aware that children learn semantics before syntax, > right? You are aware that you can't parse a sentence until you > understand it first, right? You understand that natural language > modeling is completely different from what a compiler or interpreter > does, right? You understand the bag of words model, right? > RIGHT, but I thought I was the only one. Are there others here on the AGI list? > > >> > Note a quirk of law: It is conceivable that Summly had adopted my > algorithm but kept it proprietary. As such, Yahoo would have NO claim on > the technology, and their work would NOT count as prior art. It happens all > the time - people validly patent things that it turns out someone else has > already developed. These patents are fully enforceable. > >> > >> Source code does not have much value unless you hire the developer. I > >> doubt that patents matters in this case, although I'm sure that Yahoo > >> will pursue them. > > > > By "them" I presume you mean their patents. Now, I will soon have one of > my own - or find myself in the middle of a battle at the USPTO over my > method if Yahoo (or Google or ???) has already filed a patent in the same > area - which seems unlikely. > > Yes, I believe that Yahoo will file patents to protect the technology > that they bought. My guess is that D'Aloisio was more interested in > creating something useful than worrying about patents. Yahoo will > probably file for patents, if for no other reason than to kill it. But > I doubt there is any conflict between their patents and yours. And > even if there was, Summly was first. Remember you filed just under the > old "first to invent" rules. > > >> > However, my invention was NOT what to do, but how to do such things > faster. The combinatorial explosion from failed tests hangs over the head > of all NL "understanding" efforts. From what I can see, my method is the > ONLY presently known way of prospectively running fast enough, once the > rules/tables/DB are populated with all the information needed to process > everyday English (or other natural language). > >> > >> Your bold claims > > > > Which claim(s) are you challenging; > > 1. That failed tests are the major source of overhead in NL > understanding? > > 2. That using my method will make systems run faster than not using my > method? > > 3. That there aren't other known ways of eliminating this overhead? > > 4. That we should even be talking about a full implementation of > English, when no one has yet done it? > > The bold claim that yours is the ONLY solution, No, just the only presently known solution. I stand ready to hear others. > when in fact you have > not demonstrated that your technology solves any problem, This comment isn't related to your point above. > or that > there are no other solutions. Demonstrated, by exhibition without others presenting counterexamples. Proved, not yet, but I suspect that we will get there after more discussion. All you have is an idea. Big deal. > Everyone has ideas. > Not everyone moves to claim them, e.g. with a patent. Then, either they have value, or they don't - it is all up to the marketplace. > > In particular, language models already convert words or ordinals, and > they rank matches to rare words with greater significance. Sure. Simple Huffman coding does this. > These are > the two key ideas in your patent. They might give a vast speedup if > they were not already widely used techniques in language modeling. Triggered queuing of rules to be evaluated - where? I know of no such prior art. Searching failed to find anything, but of course Google misses LOTS of things. > So > you'll need to think of something else. > Your summary dismissal makes no sense. > > >> would be more credible if you had an actual working > >> system. > > > > This is obviously a multi-million-dollar team-effort project. > > Then you need to demonstrate a prototype if you hope to attract > investors. Good luck. > I see a reasonable path to this. Now that I have a way past the combinatorial explosion,the next step is to find/develop a robust set of standards that are solid enough to support ongoing efforts like the Russian Academy of Science's Translator project. Once some software and standards are in place, people will start using it. Sharing language descriptions will advance the art. Eventually there will be enough to be really valuable. Of course it would be nice if a major player came along before I had to build this myself, but either way it is going to get build. > > >> Populating a knowledge base with rules is not so easy as you > >> think. Ask Doug Lenat. > > > > Doug has gone WAY WAY WAY beyond "simply" defining a language - he is > trying to define the whole damn world. > > And how do you estimate the number of rules that you will need? What > do you know that Doug Lenat didn't know, and 29 years later still > doesn't? Anyone can guess a number. How do you know? > DrEliza.com showed how much could be done with ~10^2 rules - less than a week of work. Sales support for many products could be done with ~10^1 rules. DrEliza.com's code and complexity would be marginally adequate for the intended application, marketing ~10^1 products **IF** it were thousands of times faster. Having reviewed the code, it is apparent that a project restart rather than an extensive rewrite would be the best approach. Hence, I have "proof" that my approach is adequate for MY intended application. We both understand that it is NOT (yet) adequate for AGI, or even automated language translation. > > > Note that my target application does NOT require a full implementation > of English, just those pieces that relate to specific problems and products > that the application addresses, one by one, as new products are added. Sure > this could eventually encompass pretty much all of the language, but just > like in your plans, this would pay for itself as it was incrementally > developed. > > Such systems already exist. They scan for words and phrases in your > email, posts, search queries, and web history to build a model of your > mind, and match the model to ads according to their content and what > the advertisers pay. All I now know about is n-gram counting. Are you aware of other system that come closer to "understanding" of some sort? > Those models are far more advanced than what you > described in your patent application because big money has already > gone into developing and testing them. > Do you have more information so I can see what you are talking about? > > > eBay showed how it is possible to insinuate a computer into the sales > process. Your plan shows how it might be possible to insinuate a computer > into the entire economic process. I just supplied some implementation > details to point a way to make this happen sooner rather than later, and in > the process eliminate the chicken-or-egg problem with money - of needing a > quadrillion dollars to make a quadrillion dollars. > > My AGI design includes a funding model. People already have an > incentive to share information and provide the time and computing > power to do so. This doesn't seem to encourage your solution until the interchange of information is in useful formats, there is a way of insinuating central control, etc. > Nobody needs to come up with a quadrillion dollars to > build a quadrillion dollar automated economy. It is already being > built. > How about a status report? Steve ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
