Thanks for replies Steve, Uwe.
> if you dont want to create your own "marker filter", you can use
> KeywordMarkerFilter (http://goo.gl/OOgf4) instead
This is pretty much what I had come up with, although I used a custom
filter class (with a similar attribute). The thing I have trouble with
is, however, that stop words may not be based on images but also on
other attributes. In particular, the Japanese pipeline uses _two_ term
suppression classes:
stream = new JapanesePartOfSpeechStopFilter(true, stream, stoptags);
...
stream = new StopFilter(matchVersion, stream, stopwords);
Of course I can just copy/paste the source of these and build my own
keyword marker, this is clear to me. But I'd rather build a filter
that delegates to these original classes and aggregates their output
so that I don't have to rebuild things on every upgrade and this is
where I'm kind of stuck. Something like:
if (!japanesePOS.accept() || !stopfilter.accept()) {
// mark the current token as a stopword.
}
I'm just not sure if I can create such a non-linear filters pipeline
-- if this isn't going to confuse the attribute management code? Node
that the above filters (japanesePOS, blah) would _not_ be part of the
token stream, the would be attached to one of the filters. Don't know
if I'm clear.
Dawid
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]