What you are looking for is OffsetAttribute. Also consider the possibility of 
using ShingleFilter with position increment > 1 and then filtering tokens 
containing "_" (underscore). This will be easier, I guess.

On Jan 11, 2013, at 7:14 AM, Igal @ getRailo.org <[email protected]> wrote:

> hi all,
> 
> how can I get the Token's Position from the TokenStream / Tokenizer / 
> Analyzer ?  I know that there's a TokenPositionIncrement Attribute and a 
> TokenPositionLength Attribute, but is there an easy way to get the token 
> position or do I need to implement my own attribute by adding one of the 
> attributes mentioned above?
> 
> the reason I need it is that I wrote an implementation of a ShingleFilter 
> which breaks shingles at punctuations so the tokens [token number one, word 
> two] will create the shingles [ "token number", "number one", "word two" ] -- 
> but Not [ "one word" ] because of the comma.  I want it to break shingles at 
> increment gaps as well.
> 
> thanks,
> 
> 
> Igal
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 

---
Denis Bazhenov <[email protected]>






---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to