The solution you proposed is still a derivative of creating a dummy document stream. Taking the same example, java (5), lucene (6), VectorTokenStream would create a total of 11 Tokens whereas only 2 is neccessary.
That's easy to fix. We just need to reuse the token:
public class VectorTokenStream extends TokenStream {
private int term = -1;
private int freq = 0;
private Token token;
public VectorTokenStream(String[] terms, int[] freqs) {
this.terms = terms;
this.freqs = freqs;
}
public Token next() {
if (freq == 0) {
term++;
if (term >= terms.length)
return null;
token = new Token(terms[term], 0, 0);
freq = freqs[term];
}
freq--;
return token;
}
}Then only two tokens are created, as you desire.
If you for some reason don't want to create a dummy document stream, then you could instead implement an IndexReader that delivers a synthetic index for a single document. Then use IndexWriter.addIndexes() to turn this into a real, FSDirectory-based index. However that would be a lot more work and only very marginally faster. So I'd stick with the approach I've outlined above. (Note: this code has not been compiled or run. It may have bugs.)
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
