Hi. I was wondering if anyone else has seen this before. I'm using lucene 1.4.3 and have indexed about 3000 text documents using the statement:
doc.add(Field.Text("contents", new FileReader(f), true)); When I go and retrieve the term frequency vectors, for any document under about 90k, everything looks as expected. However for larger documents (I haven't found the exact point, but I know that those over 128k qualify) the sum of the term frequencies in the vector seems to max out at 10001. Here's the code snippet that I'm using when I see this: int vecSize = vector.size(); for (int j = 0; j < vecSize; j++) { currentTermFreq = vector.getTermFrequencies()[j]; sumTermFreq = currentTermFreq + sumTermFreq; if ( currentTermFreq > maxTermFreq) { maxTermFreq = currentTermFreq; } } The results in sumTermFreq winds up being 10001 for large documents. The vector.size() varies from document to document, the term with the highest freqency (and that frequency) varies from document to document, but not the sum. Any thougths/suggestions would be appreciated. Thanks --MG __________________________________ Yahoo! FareChase: Search multiple travel sites in one click. http://farechase.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]