Yeah, this is exactly what I was thinking about and it even worked (accept being protected is not a huge problem because these classes are not final so I can open it up using a local subclass). I just wasn't sure if this isn't too hacky. Thanks Uwe.
Dawid On Wed, Aug 22, 2012 at 10:03 AM, Uwe Schindler <[email protected]> wrote: > You could misuse the attributes API: > > All filters in a chain have the same attributes. This is achieved by the > chaining (new TokenFilter(other TS) shares the attributes). What you could > do to be non-linear in chaining: > > > > Create the "helpers" that are not part of the chain, by linking them to the > input TokenStream, but never call incrementToken() on them. Their internals > will always see the same attributes and attribute contents, so you could > call accept() - if it would not be protected. The stream is controlled by > our TokenFilter, so we incrementToken() only on ours, we just misuse the > accept method (because it operates on the attributes we already populated by > our own call to incrementToken()): > > > > stopwordMarkFilter = new TokenFilter(....) { > > private final markerAtt = addAttribute(...); > > private final FilteringTokenFilter japanesePOS = new new > JapanesePartOfSpeechStopFilter(true, input, stoptags); > > private final FilteringTokenFilter stopfilter = new > StopFilter(matchVersion, input, stopwords); > > > > public boolean incrementToken() { > > if (!input.incrementToken()) return false; > > if (!japanesePOS.accept() || > !stopfilter.accept()) { > > // mark the current token as > a stopword. > > > markerAtt.setIsStopword(true); > > } > > return true; > > } > > } > > > > The only problem, as accept is not intended to be called from the outside, > it is of course protected... > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: [email protected] > > > >> -----Original Message----- > >> From: [email protected] [mailto:[email protected]] On Behalf Of > >> Dawid Weiss > >> Sent: Wednesday, August 22, 2012 8:51 AM > >> To: [email protected] > >> Subject: Re: Looking for a code pattern to pass stop words as an attribute > >> > >> Thanks for replies Steve, Uwe. > >> > >> > if you dont want to create your own "marker filter", you can use > >> > KeywordMarkerFilter (http://goo.gl/OOgf4) instead > >> > >> This is pretty much what I had come up with, although I used a custom >> filter > >> class (with a similar attribute). The thing I have trouble with is, >> however, that > >> stop words may not be based on images but also on other attributes. In > >> particular, the Japanese pipeline uses _two_ term suppression classes: > >> > >> stream = new JapanesePartOfSpeechStopFilter(true, stream, stoptags); > >> ... > >> stream = new StopFilter(matchVersion, stream, stopwords); > >> > >> Of course I can just copy/paste the source of these and build my own >> keyword > >> marker, this is clear to me. But I'd rather build a filter that delegates >> to these > >> original classes and aggregates their output so that I don't have to >> rebuild > >> things on every upgrade and this is where I'm kind of stuck. Something >> like: > >> > >> if (!japanesePOS.accept() || !stopfilter.accept()) { > >> // mark the current token as a stopword. > >> } > >> > >> I'm just not sure if I can create such a non-linear filters pipeline > >> -- if this isn't going to confuse the attribute management code? Node that >> the > >> above filters (japanesePOS, blah) would _not_ be part of the token stream, >> the > >> would be attached to one of the filters. Don't know if I'm clear. > >> > >> Dawid > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: [email protected] For additional > >> commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
