What I've done is: State state = in.captureState(); ... // Upon new call to incrementToken(). State tmp = in.captureState(); in.restoreState(state); // check if termAttribute is an abbreviation. If not : in.restoreState(tmp);
But seems a lot of capturing/restoring to me ... how expensive is that? Shai On Sun, Nov 22, 2009 at 3:57 PM, Shai Erera <ser...@gmail.com> wrote: > Perhaps I misunderstand something. The current use case I'm trying to solve > is - I have an abbreviations TokenFilter which reads a token and stores it. > If the next token is end-of-sentence, it checks whether the previous one is > in the abbreviations list, and discards the end-of-sentence token. I need to > store the first token somewhere so I can reference it. > > Example: "hello mr. shai" > First token = hello -> store it and return > Second token = mr -> store it and return > Third token = "." -> check if "mr" is an abbreviation, if so don't return > ".". > Fourth token = "shai" -> store it and return. > ... > > How do I store "mr" (or any of the others)? It was easy w/ copyTo. If I > captureState, I get a State, but I can't query it for a TermAttribute. Any > ideas? > > Shai > > > On Sun, Nov 22, 2009 at 3:33 PM, Uwe Schindler <u...@thetaphi.de> wrote: > >> Use captureState and save the state somewhere. You can restore the state >> with restoreState to the TokenStream. CachingTokenFilter does this. >> >> So the new API uses the State object to put away tokens for later >> reference. >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> > -----Original Message----- >> > From: Shai Erera [mailto:ser...@gmail.com] >> > Sent: Sunday, November 22, 2009 2:29 PM >> > To: java-user@lucene.apache.org >> > Subject: Re: How to deal with Token in the new TS API >> > >> > ok so from what I understand, I should stop working w/ Token, and move >> to >> > working w/ the Attributes. >> > >> > addAttribute indeed does not work. Even though it does not through an >> > exception, if I call in.addAttribute(Token.class), I get a new instance >> of >> > Token and not the once that was added by in. So this is even more severe >> > than just not blocking this option. >> > >> > I thought I can move to use addAttributeImpl, but that won't help me, >> > because I won't be able to call getAttribute(Token.class). >> > >> > So this leaves me w/ just working w/ the interfaces. >> > >> > What do I need to do in order to clone an attribute? Previously I used >> > token.copyTo(target). How I can do it now if I don't have copyTo on the >> > interfaces, and/or clone? >> > >> > Shai >> > >> > On Sun, Nov 22, 2009 at 2:58 PM, Uwe Schindler <u...@thetaphi.de> wrote: >> > >> > > > But I do use addAttribute(Token.class), so I don't understand why >> you >> > say >> > > > it's not possible. And I completely don't understand why the new API >> > > > allows >> > > > me to just work w/ interfaces and not impls ... A while ago I got >> the >> > > > impression that we're trying to get rid of interfaces because >> they're >> > not >> > > > easy to maintain back-compat with ... >> > > >> > > AddAttribute(Token.class) should throw an Exception, but it doesn't >> > (it's a >> > > bug in 3.0). addAttribute should only affect interfaces, it also >> accepts >> > > Token, because the AttributeFactory accepts it - bang. >> > > >> > > Sorry, but you can only pass attribute class literals to >> > > addAttribute/getAttribute/hasAttribute and so on. >> > > >> > > Sorry. >> > > >> > > Uwe >> > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> > > For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > >> > > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >