RE: ConstantScoreQuery and MatchAllDocsQuery

Jean-Francois Beaulac Tue, 27 Feb 2007 14:35:47 -0800

Hi,

The existing code retrieved a TermPositionVector with 
IndexReader.getTermFreqVector(docId, field). It then extracted the terms for
the query and stores them in two different array.


One containing single word terms, the other containing the phrases.

For single word term it loops on the array of term and increment the frequency 
this way:

freq += tpv.getTermFrequencies()[tpv.indexOf(currentTerm.text())];

For the phrase it works the same way, but of course it searches for the entire 
set of terms in the correct order.

Fast enough means that for the search query: on going for, if I have 3000 
results which consists of document with an average of 1000
words it must be able to do it under 50ms on a dual Xeon machine. With the 
TermPositionVector my best results with no load on the
server were around 3000ms.

I am still an amateur with lucene, I have to migrate an application which used 
a customized version of lucene 1.3 to 2.1. I would
really like to be able to use an unmodified version of lucene since it would be 
a lot easier to keep up to date with lucene.

I'll give a try with TermDocs.

Thanks

-----Message d'origine-----
De : Chris Hostetter [mailto:[EMAIL PROTECTED] 
Envoyé : February 23, 2007 7:18 PM
À : Lucene Users
Objet : Re: ConstantScoreQuery and MatchAllDocsQuery



: I ask this because I need to return the frequency of the search terms
: with each of my results, I tried using the TermFreqVector object but
: unfortunately it was not fast enough, so I decided to modifiy lucene to
: be able to return the frequency the same way the score is returned by
: org.apache.lucene.search.Hits.
        ...
: I started by adding public abstract int freq(); in package
: org.apache.lucene.search.Scorerabstract class, and then modified
: everyimplementation of Scorer to be able to get the frequency.

can you elaborate on:
 * how you were trying to use TermFreqVector
 * how you define "fast enough"
 * how you are now getting the freq() value in all of the Scorer classes?

If all you need to know is the frequency of each term in your query (and
not hte frequency of all terms in teh document) did you try using the
freq() method in the TermDocs iterator instead of the TermFreqVector
class?

using Query.extractTerms, and then getting a TermDocs instance
and iterating over those terms using seek and over the docids from your
results using skipTo should be an extremely fast way to get the freq()
info.

: It works well and fast, the only problem I have is that I did not find a
: way to compute the frequency in both ConstantScoreQuery.java and
: MatchAllDocsQuery.java internal scorers.

neither of those queries involve any terms, so i'm not sure what freq()
would even make sense ... "1" or "0" i would imagine.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: ConstantScoreQuery and MatchAllDocsQuery

Reply via email to