On 2/21/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > your "Aim of the Query formation" got truncated, so it's not entirely > clear what you are looking for, but if the general idea of what you are
Documents should be ordered as follows Rank 1: Documents containing section containing all terms in the order of the query terms Rank 2: Documents containing section containing all terms but randomly ordered Rank 3: Documents containing atleast n (n < N, where N is total number of query terms) in the same section and in order and so on... But considering the number of query terms to be large, I am not really looking for the exact ranking. Anything closer would work just fine. However, the number of query terms is what i am more concerned about when it comes to implementing using Phrase query versus Span query in terms of query speed. looking for is that you want searches for phrase like "quick brown fox" to > only match if/when the words "quick" "brown" and "fox" all appear in the > same section in the specified order, and you want documents in which the > phrases appear more then once to bescored higher then a simple PhraseQuery > with a high slop factor and "inOrder=true" should work fine ... the key > being that your slop value needs to be at least as big as the largest > section size you can have, and less then the gap you put between sections. > > I have no idea if it will be faster/slower then a span query, but it's a > little simpler because you don't need to use artificial section boundry > tokens. Wouldnt the phrase/span query match across sections if I do not introduced the artifical section boundry? If you want to tweak how much the score is influenced by the proximity of > the words in the query, vs the frequency of hte phrases in the docs, see > my recent posting about the use of tf in Similarity -- which i think is > accurate since nobody replied and said i was wrong... > > > http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html I will take a closer look at the explaination. Thanks, Rajesh Munavalli