Hi Dawid,

if you dont want to create your own "marker filter", you can use 
KeywordMarkerFilter (http://goo.gl/OOgf4) instead StopFilter. This will work 
perfectly and don’t affect other filters, if you don’t have stemming in your 
analysis chain. The trick is to pass the stop-set to KeywordMarkerFilter 
instead the StopFiter. This one will mark those as keywords instead of removing 
them.

If you also have stemming, the easiest is to clone the source code of 
KeywordMarkerFilter and populate another attribute (a custom one like 
StopAttribute) with the same information.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Dawid Weiss [mailto:[email protected]]
> Sent: Tuesday, August 21, 2012 10:34 PM
> To: [email protected]
> Subject: Looking for a code pattern to pass stop words as an attribute
> 
> Seeking advice.
> 
> I have an application where I need to know which tokens are stop words. Most
> analyzers construct the token stream in a way that those tokens are filtered 
> out
> -- this isn't what I need, I want them in, but marked somehow. The question is
> how to do it nicely and in a simple way, possibly reusing existing token 
> filters? I
> had a few ideas but they all seem awkward -- let me know if I'm missing
> something obvious.
> 
> Dawid
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to