your "Aim of the Query formation" got truncated, so it's not entirely clear what you are looking for, but if the general idea of what you are looking for is that you want searches for phrase like "quick brown fox" to only match if/when the words "quick" "brown" and "fox" all appear in the same section in the specified order, and you want documents in which the phrases appear more then once to bescored higher then a simple PhraseQuery with a high slop factor and "inOrder=true" should work fine ... the key being that your slop value needs to be at least as big as the largest section size you can have, and less then the gap you put between sections.
I have no idea if it will be faster/slower then a span query, but it's a little simpler because you don't need to use artificial section boundry tokens. If you want to tweak how much the score is influenced by the proximity of the words in the query, vs the frequency of hte phrases in the docs, see my recent posting about the use of tf in Similarity -- which i think is accurate since nobody replied and said i was wrong... http://www.nabble.com/Similarity-Usage%3A-tf%28int%29-vs-tf%28float%29-p2981283.html : Date: Tue, 21 Feb 2006 17:45:12 -0600 : From: Rajesh Munavalli <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Phrase query vs span query : : I am trying to adopt lucene for a special IR system. The following scenario : is an approximation of what I am trying to do. Please bear with me if some : things doesnt make sense. I need some suggestions on formulating queries for : the following scenario : : Each document consists of a set of fields (standard in lucene). But in my : case, the field is somewhat different as explained below. : : Field: : --------- : Each field consists of a set of conceptual sections. Each of these sections : is separated by say N (say 1000) index positions but are in the same field. : Sizes of sections vary and do not have any lower or upper bound on the : number of terms they may contain : . : Ex: Lets say Field "contents" has : <section1 of 100 terms><gap of 1000 term positions><section 2 of 1500 : terms><gap of 1000 term positions><gap of 1000 term positions><section 3 of : 10 terms> : : NOTE: At index time, I am assuming I somehow know how to form these : sections. : : Typical Query: : --------------------- : Consists of 15 to 30 query terms. In other words, these query terms : represent a conceptual section. : : Aim of the Query formation: : ---------------------------------------- : I want to rank the documents proportional to the number query terms : appearing in the SAME SECTION and IN ORDER. Documents containing terms with : the : : My Questions: : --------------------- : Considering the structure of the fields/documents and the number of query : terms. : : (1) Is there an effective way of formulating a query with the existing query : types in Lucene? : : (2) After considering the way different queries work and their limitations, : I think forming phrase/span queries of groups of query terms : might approximate the rankings I am expecting. In that case which of the : following queries will perform better (in terms of QUERY SPEED and RANKING) : (a) phrase query with certain slope factor : (b) span query : : Thanks, : : Rajesh Munavalli : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]