All filters must be final and they are?: public final class StopFilter extends FilteringTokenFilter public final class JapanesePartOfSpeechStopFilter extends FilteringTokenFilter
In all cases you can move your special filter into the package of FilteringTokenFilter.... ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: [email protected] > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > Dawid Weiss > Sent: Wednesday, August 22, 2012 10:11 AM > To: [email protected] > Subject: Re: Looking for a code pattern to pass stop words as an attribute > > Yeah, this is exactly what I was thinking about and it even worked (accept > being > protected is not a huge problem because these classes are not final so I can > open it up using a local subclass). I just wasn't sure if this isn't too > hacky. Thanks > Uwe. > > Dawid > > On Wed, Aug 22, 2012 at 10:03 AM, Uwe Schindler <[email protected]> wrote: > > You could misuse the attributes API: > > > > All filters in a chain have the same attributes. This is achieved by > > the chaining (new TokenFilter(other TS) shares the attributes). What > > you could do to be non-linear in chaining: > > > > > > > > Create the "helpers" that are not part of the chain, by linking them > > to the input TokenStream, but never call incrementToken() on them. > > Their internals will always see the same attributes and attribute > > contents, so you could call accept() - if it would not be protected. > > The stream is controlled by our TokenFilter, so we incrementToken() > > only on ours, we just misuse the accept method (because it operates on > > the attributes we already populated by our own call to incrementToken()): > > > > > > > > stopwordMarkFilter = new TokenFilter(....) { > > > > private final markerAtt = addAttribute(...); > > > > private final FilteringTokenFilter japanesePOS = new > > new JapanesePartOfSpeechStopFilter(true, input, stoptags); > > > > private final FilteringTokenFilter stopfilter = new > > StopFilter(matchVersion, input, stopwords); > > > > > > > > public boolean incrementToken() { > > > > if (!input.incrementToken()) return > > false; > > > > if (!japanesePOS.accept() || > > !stopfilter.accept()) { > > > > // mark the current > > token as a stopword. > > > > > > markerAtt.setIsStopword(true); > > > > } > > > > return true; > > > > } > > > > } > > > > > > > > The only problem, as accept is not intended to be called from the > > outside, it is of course protected... > > > > > > > > ----- > > > > Uwe Schindler > > > > H.-H.-Meier-Allee 63, D-28213 Bremen > > > > http://www.thetaphi.de > > > > eMail: [email protected] > > > > > > > >> -----Original Message----- > > > >> From: [email protected] [mailto:[email protected]] On Behalf > >> Of > > > >> Dawid Weiss > > > >> Sent: Wednesday, August 22, 2012 8:51 AM > > > >> To: [email protected] > > > >> Subject: Re: Looking for a code pattern to pass stop words as an > >> attribute > > > >> > > > >> Thanks for replies Steve, Uwe. > > > >> > > > >> > if you dont want to create your own "marker filter", you can use > > > >> > KeywordMarkerFilter (http://goo.gl/OOgf4) instead > > > >> > > > >> This is pretty much what I had come up with, although I used a custom > >> filter > > > >> class (with a similar attribute). The thing I have trouble with is, > >> however, that > > > >> stop words may not be based on images but also on other attributes. > >> In > > > >> particular, the Japanese pipeline uses _two_ term suppression classes: > > > >> > > > >> stream = new JapanesePartOfSpeechStopFilter(true, stream, > >> stoptags); > > > >> ... > > > >> stream = new StopFilter(matchVersion, stream, stopwords); > > > >> > > > >> Of course I can just copy/paste the source of these and build my own > >> keyword > > > >> marker, this is clear to me. But I'd rather build a filter that > >> delegates to these > > > >> original classes and aggregates their output so that I don't have to > >> rebuild > > > >> things on every upgrade and this is where I'm kind of stuck. > >> Something > >> like: > > > >> > > > >> if (!japanesePOS.accept() || !stopfilter.accept()) { > > > >> // mark the current token as a stopword. > > > >> } > > > >> > > > >> I'm just not sure if I can create such a non-linear filters pipeline > > > >> -- if this isn't going to confuse the attribute management code? Node > >> that the > > > >> above filters (japanesePOS, blah) would _not_ be part of the token > >> stream, the > > > >> would be attached to one of the filters. Don't know if I'm clear. > > > >> > > > >> Dawid > > > >> > > > >> --------------------------------------------------------------------- > > > >> To unsubscribe, e-mail: [email protected] For > >> additional > > > >> commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] For additional > commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
