On 2/22/06, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > > > Typical Query: > > --------------------- > > Consists of 15 to 30 query terms. In other words, these query terms > > represent a conceptual section. > > Would you need synonyms of these terms, too?
Yes. > > (2) After considering the way different queries work and their > limitations, > > I think forming phrase/span queries of groups of query terms > > might approximate the rankings I am expecting. In that case which of the > > following queries will perform better (in terms of QUERY SPEED and > RANKING) > > (a) phrase query with certain slope factor > > (b) span query > > SpanQuery is slower than PhraseQuery, but it has the advantage that it can > be nested. Nesting here means the possibility to use eg. a short phrase as > a unit to be matched and scored. I wasn't aware of the capability to nest spanquery. Is there a link where I could read more about this? To formulate a single query for your requirements, > there is still the problem that PhraseQuery and SpanQuery only work when > all their "terms" are present in an indexed lucene document field. > Putting it differently, when fewer terms present, their order cannot > be taken into account, unless the query contains an (non)ordered query > specifying a subset of the terms present in the documents. > I was thinking of building a boolean combination of either phrase/span query on subset of terms. Though its not exhaustive, but might be sufficient in majority of the cases. An alternative to the current span query implementation is here: > http://issues.apache.org/jira/browse/LUCENE-413 > but this will only help to get an impression of how to match in the > ordered > and unordered cases. > It might be possible to generalize the various span algorithms there and > in the trunk to work with fewer "terms". > I will consider that option. Thanks, Rajesh Munavalli