Thanks Doug. I will do just that.
Just for my education, can you maybe elaborate on using the
"implement an IndexReader that delivers a
synthetic index" approach?
Thanks in advance
-John
On Thu, 08 Jul 2004 10:01:59 -0700, Doug Cutting <[EMAIL PROTECTED]> wrote:
> John Wang wrote:
> > The solution you proposed is still a derivative of creating a
> > dummy document stream. Taking the same example, java (5), lucene (6),
> > VectorTokenStream would create a total of 11 Tokens whereas only 2 is
> > neccessary.
>
> That's easy to fix. We just need to reuse the token:
>
> public class VectorTokenStream extends TokenStream {
> private int term = -1;
> private int freq = 0;
> private Token token;
> public VectorTokenStream(String[] terms, int[] freqs) {
> this.terms = terms;
> this.freqs = freqs;
> }
> public Token next() {
> if (freq == 0) {
> term++;
> if (term >= terms.length)
> return null;
> token = new Token(terms[term], 0, 0);
> freq = freqs[term];
> }
> freq--;
> return token;
> }
> }
>
> Then only two tokens are created, as you desire.
>
> If you for some reason don't want to create a dummy document stream,
> then you could instead implement an IndexReader that delivers a
> synthetic index for a single document. Then use
> IndexWriter.addIndexes() to turn this into a real, FSDirectory-based
> index. However that would be a lot more work and only very marginally
> faster. So I'd stick with the approach I've outlined above. (Note:
> this code has not been compiled or run. It may have bugs.)
>
>
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]