positional token info

Erik Hatcher Mon, 20 Oct 2003 18:15:04 -0700

Is anyone doing anything interesting with the Token.setPositionIncrement during analysis?

Just for fun, I've written a simple stop filter that bumps the position increments to account for the stop words removed:

public final Token next() throws IOException { int increment = 0; for (Token token = input.next(); token != null; token = input.next()) {

if (table.get(token.termText()) == null) { token.setPositionIncrement(token.getPositionIncrement() + increment); return token; }

      increment++;
    }

    return null;
  }

But its practically impossible to formulate a Query that can take advantage of this. A PhraseQuery, because Terms don't have positional info (only the transient tokens), only works using a slop factor which doesn't guarantee an exact match like I'm after. A PhrasePrefixQuery won't work any better as there is no way to add in a "blank" term to indicate a missing position.

I certainly see the benefit of putting tokens into zero-increment positions, but are increments of 2 or more at all useful? If so, how are folks using it?

Thanks,
        Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

positional token info

Reply via email to