On Mon, Apr 1, 2013 at 5:35 AM, Steve Richfield <[email protected]> wrote: > Is Summly's algorithm described somewhere?
Not really. http://summly.com/technology.html But the general idea is to remove the parts of the text that compress easily, so you are left with low-frequency words and phrases from the beginning of the document or that occur frequently in the document. You can also use supervised learning to train the language model to select the right features. How well all of this works depends on the quality of the language model. Generally it would be an ensemble of n-gram models, with parsing playing little or no role. http://en.wikipedia.org/wiki/Automatic_summarization > Note a quirk of law: It is conceivable that Summly had adopted my algorithm > but kept it proprietary. As such, Yahoo would have NO claim on the > technology, and their work would NOT count as prior art. It happens all the > time - people validly patent things that it turns out someone else has > already developed. These patents are fully enforceable. Source code does not have much value unless you hire the developer. I doubt that patents matters in this case, although I'm sure that Yahoo will pursue them. > However, my invention was NOT what to do, but how to do such things faster. > The combinatorial explosion from failed tests hangs over the head of all NL > "understanding" efforts. From what I can see, my method is the ONLY presently > known way of prospectively running fast enough, once the rules/tables/DB are > populated with all the information needed to process everyday English (or > other natural language). Your bold claims would be more credible if you had an actual working system. Populating a knowledge base with rules is not so easy as you think. Ask Doug Lenat. -- -- Matt Mahoney, [email protected] ------------------------------------------- AGI Archives: https://www.listbox.com/member/archive/303/=now RSS Feed: https://www.listbox.com/member/archive/rss/303/21088071-f452e424 Modify Your Subscription: https://www.listbox.com/member/?member_id=21088071&id_secret=21088071-58d57657 Powered by Listbox: http://www.listbox.com
