can you share a description of the heuristics you used to clean up the text? i am 
facing the same problem right now handling email. i'm not interested in the rules you 
use as much as the tools you use to implement the rules.

Herb....

-----Original Message-----
From: Ulrich Mayring [mailto:[EMAIL PROTECTED]
Sent: Friday, November 28, 2003 4:21 AM
To: [EMAIL PROTECTED]
Subject: Re: New Lucene-powered Website

This "clean-up work" is actually trickier than the summarising itself 
and it is usually very domain-specific. That's the reason why I haven't 
proposed to contribute the summariser to Lucene, because the clean-up 
code is not generic. The summariser itself is just one class with 300 
lines, but without prior clean-up the quality of its summaries is 
insufficient.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to