Re: indexing help

Doug Cutting Thu, 08 Jul 2004 10:02:15 -0700

John Wang wrote:

     The solution you proposed is still a derivative of creating a
dummy document stream. Taking the same example, java (5), lucene (6),
VectorTokenStream would create a total of 11 Tokens whereas only 2 is
neccessary.


That's easy to fix.  We just need to reuse the token:

public class VectorTokenStream extends TokenStream {
  private int term = -1;
  private int freq = 0;
  private Token token;
  public VectorTokenStream(String[] terms, int[] freqs) {
    this.terms = terms;
    this.freqs = freqs;
  }
  public Token next() {
    if (freq == 0) {
      term++;
      if (term >= terms.length)
        return null;
      token = new Token(terms[term], 0, 0);
      freq = freqs[term];
    }
    freq--;
    return token;
  }
}

Then only two tokens are created, as you desire.

If you for some reason don't want to create a dummy document stream, then you could instead implement an IndexReader that delivers a synthetic index for a single document. Then use IndexWriter.addIndexes() to turn this into a real, FSDirectory-based index. However that would be a lot more work and only very marginally faster. So I'd stick with the approach I've outlined above. (Note: this code has not been compiled or run. It may have bugs.)

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: indexing help

Reply via email to