Re: Looking for a code pattern to pass stop words as an attribute

Dawid Weiss Tue, 21 Aug 2012 23:52:05 -0700

Thanks for replies Steve, Uwe.

> if you dont want to create your own "marker filter", you can use 
> KeywordMarkerFilter (http://goo.gl/OOgf4) instead


This is pretty much what I had come up with, although I used a custom
filter class (with a similar attribute). The thing I have trouble with
is, however, that stop words may not be based on images but also on
other attributes. In particular, the Japanese pipeline uses _two_ term
suppression classes:

    stream = new JapanesePartOfSpeechStopFilter(true, stream, stoptags);
    ...
    stream = new StopFilter(matchVersion, stream, stopwords);

Of course I can just copy/paste the source of these and build my own
keyword marker, this is clear to me. But I'd rather build a filter
that delegates to these original classes and aggregates their output
so that I don't have to rebuild things on every upgrade and this is
where I'm kind of stuck.  Something like:

if (!japanesePOS.accept() || !stopfilter.accept()) {
  // mark the current token as a stopword.
}

I'm just not sure if I can create such a non-linear filters pipeline
-- if this isn't going to confuse the attribute management code? Node
that the above filters (japanesePOS, blah) would _not_ be part of the
token stream, the would be attached to one of the filters. Don't know
if I'm clear.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Looking for a code pattern to pass stop words as an attribute

Reply via email to