Kent, Erik, On Saturday 29 November 2003 17:20, Erik Hatcher wrote: > I enjoy at least attempting to answer questions here, even if I'm half > wrong, so by all means correct me if I misspeak....
Me too, :) > On Saturday, November 29, 2003, at 06:37 PM, Kent Gibson wrote: > > All I would like to know is how many times a query was > > found in a particular document. I have no problems > > getting the score from hits.score(). hits.length is > > the number of times in total that the query was found, > > however I want the the number of times the query was > > found on a document by document basis. is this > > possible? Could you be a bit more precise on what you mean by 'the number of times the query was found'? For a single query term, it is straightforward, but what about eg. a query for three optional terms? > > The 'coord' factor used in computing the score is exactly this. See > the javadoc for it: > > http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/ > Similarity.html#coord(int,%20int) AFAIK, this overlap is the number of terms the document and the query have in common. For a query consisting of a single term, the overlap is always one, and the number of times the query occurs in a document is the term frequency in the document. > You could implement a custom Similarity to capture the "overlap" or > adjust the the factor depending on what you're trying to accomplish. > > > The only idea I have is to rerun the search, > > but I can't even see how to run a search on only one > > document! > > You could always rerun a search with a Filter with only one bit enabled > and see if zero or one document is returned - that would be quite > trivial and fast. You could also implement a Similarity that ignores the total number of terms in the searched document field, see lengthNorm() in http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html As lengthNorm() is applied at indexing time, you will have to reindex for this to work for you. At query time you can then use a tf() implementation that is linear, instead of the default square root in DefaultSimilarity, and a constant idf(), instead of the default log of the inverse document frequency. You should then get a document score that is proportional to the number of query terms in the document. Kind regards, Ype --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
