I'm not 100% sure I understand your question, but... : order to compute the TF I count the occurences of terms which are : similar to the term. But I've got problems to compute the IDF, because I : must know the number of documents in which the term appears before : searching for the documents (in the method sumOfSquaredWeights() in my
...to get the number of docs that contain a specific term, you can use IndexReader.docFreq(Term) : Date: Mon, 13 Jun 2005 21:30:21 +0200 : From: Barbara Krausz <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org, java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: Determining the IDF while searching for documents : : Hi all, : : is it possible to determine the IDF (the documents in which a term : appears) while searching for documents? I implemented an index based on : trigrams, i.e. the indexterms are now Strings of 3 characters so that my : search engine finds documents with OCR-Errors. When I'm searching for : the term "rainstorm" for example I split it up into the trigrams __r, : _ra, rai, ain, ins... : First I look for documents which contain at least 8 of the 11 trigrams : of "rainstorm" (the misspelled "ranstorm" contains 8 of the 11 : trigrams), then I check if the trigrams form a term like "rainstorm". In : order to compute the TF I count the occurences of terms which are : similar to the term. But I've got problems to compute the IDF, because I : must know the number of documents in which the term appears before : searching for the documents (in the method sumOfSquaredWeights() in my : weight). I used hsqldb during indexing and saved the number of documents : for each term. But it's really slow. : My question is the following: When I'm searching for documents which : contain terms similar to the searchterm I actually get the number of : documents that contain the term. But I need the IDF before searching : these documents for example for BooleanQueries which need the IDF to : normalize the queryvector. Can I solve this problem, i.e. can I : determine the IDF later and normalize the BooleanQuery? : : Thanks : Barbara : : --------------------------------------------------------------------- : To unsubscribe, e-mail: [EMAIL PROTECTED] : For additional commands, e-mail: [EMAIL PROTECTED] : -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]