What if my intention was to find all three words in a document not necessarily in one sentence? Here is my goal
(1) All three words appearing together should be given Rank 1 (2) Three words appearing somewhere in the sentence given Rank 2 (3) Documents containing words in different sentences should be given Rank 3 (4) Documents missing one or more of query terms should be given Rank 4 Correct me if I am wrong... Proximity search is concerned about query terms appearing closer to one another within a certain distance in the document. Thanks, Rajesh Munavalli -----Original Message----- From: Chen Wei Zhu [mailto:[EMAIL PROTECTED] Sent: Thursday, July 14, 2005 10:40 AM To: [email protected] Subject: Re: n-gram and multiword query i remember lucene doesn't do anything for proximity. On 7/14/05, Rajesh Munavalli <[EMAIL PROTECTED]> wrote: > Consider a document with the following contents " Levenshtein distance > is named after the Russian scientist Vladimir Levenshtein and is also > called edit distance" > > Possible bi-grams are (after removing the stop words in the beginning > and end) "Levenshtein distance", "named after", "Russian scientist", > "scientist Vladimir", "Vladimir Levenshtein" called edit", "edit > distance" > > If my query term is "Vladimir levenshtein distance", how does Lucene > compute the similarity to the indexed terms? Are query terms appearing > together given more importance? How does it account for gaps (caused > by stop word removal) while matching multiword query? > > thanks, > > Rajesh Munavalli > > -- Thanks! yours, WeiZhu Chen
