Hi,

Erik Hatcher a écrit:

Is anyone doing anything interesting with the Token.setPositionIncrement during analysis?

I think so :-) Well... my arabic analyzer is based on this functionnality.


The basic idea is to have several tokens at the same position (i.e. setPositionIncrement(0)) which are different possible stems for the same word.

But its practically impossible to formulate a Query that can take advantage of this. A PhraseQuery, because Terms don't have positional info (only the transient tokens)

Correct !


I've made a dirty patch for the QueryParser which is able to handle tokens with positionIncrement equal to 0 or 1 (see bug #23307). It still needs some work, but it fits my needs :-)

I certainly see the benefit of putting tokens into zero-increment positions, but are increments of 2 or more at all useful?

Who knows ? I may be interesting to keep track of the *presence* of "empty words", e.g. "[the] sky [is] blue", "[the] sky [is] [really] blue", "[the] sky [is] [that] [really] blue". The traditionnal reduction to "sky blue" is maybe over-simplistic for some cases...


Well, just an idea.

Cheers,

--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:[EMAIL PROTECTED]


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to