On Jan 27, 2007, at 10:23 AM, Eric Lease Morgan wrote:
Do y'all know of any open source text summarizers?
Thank you for the prompt replies, and in the end I used a combination of summarizers: 1. First I used the Perl module Lingua::EN::Keywords. This works quite well. Given some text it returns five words it thinks are most significant. 2. Second, I used Lingua::EN::Summarizer. Given a text it returns one or two sentences from a text it thinks are relevant. This does not work as well as Summarizer #1. 3. OTS (Open Text Summarizer), like Lingua::EN::Keywords, returns a list of words it thinks are relevant. So, so. My real goal was to create a list of tags to associate with full-text files. To create my tags I: 1. Got a list of words from Lingua::EN::Keywords. 2. Added the words from Lingua::EN::Summarizer. 3. Added the words from OTS. 4. Added the words from the file's title. 5. Added the words from the file's author. 6. Normalized all the words (lowered case, removed punctuation, etc.) 7. Removed duplicates. 8. Removed stop words. Finally, I used Net::Delicious to upload 675 Alex Catalogue of Electronic Texts links to del.icio.us. The results aren't too bad. Heck, I certainly couldn't catalog 675 items that quickly. See: http://del.icio.us/infomotions/alex No, it is not perfect, but it is certainly better than nothing! -- Eric Lease Morgan University Libraries of Notre Dame