you cannot layer sentence boundary detection on top of Lucene and post process the hit list without effectively building a completely new search engine index. if i am going to go to this trouble, there is no point to using Lucene at all.
Herb.... -----Original Message----- From: Tatu Saloranta [mailto:[EMAIL PROTECTED] Sent: Friday, November 14, 2003 8:30 PM To: Lucene Users List Subject: Re: inter-term correlation [was Re: Vector Space Model in Lucene?] Hmmh? You implied that there are some useful distance heuristics (words 5 words apart or more correlate much less), and others have pointed out Lucene has many useful components. Building more complex system from small components is usually considered a Good Thing (tm), not an "ad hoc solution". In fact, I would guess most experienced people around here start with Lucene defaults, and build their own systems gradually customizing more and more of pieces. It may be there are actual fundamental problems with Lucene, regarding approach you'd prefer, but I don't think it makes sense to brush off suggestions regarding distance & fuzzy/sloppy queries by claiming they are "just hacks". --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
