I think I had better build you a test case for this situation, and attach it to a JIRA.
On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless <luc...@mikemccandless.com> wrote: > Something is wrong; I'm not sure what offhand, but calling peekToken > 10 times should not stack all tokens @ position 0; it should stack the > tokens at the positions where they occurred. Are you sure the posIncr > att is sometimes 1 (i.e., the position is in fact moving forward for > some tokens)? > > nextToken() only calls peekToken() once the lookahead buffer is exhausted. > > afterPosition() should be called within nextToken(), for each > position, once all tokens leaving that position are done. > > You use case *should* be working: inside your incrementToken() you > call peekToken() over and over until you've seen the full sentence > (saving away any state in your subclass of Position), then nextToken() > to emit the buffered tokens, and to insert your own tokens when > afterPosition() is called ... > > Mike McCandless > > http://blog.mikemccandless.com > > > On Sat, Sep 7, 2013 at 1:10 PM, Benson Margulies <ben...@basistech.com> wrote: >> nextToken() calls peekToken(). That seems to prevent my lookahead >> processing from seeing that item later. Am I missing something? >> >> >> On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies <ben...@basistech.com> >> wrote: >>> I think that the penny just dropped, and I should not be using this class. >>> >>> If I call peekToken 10 times while sitting at token 0, this class will >>> stack up all 10 of these _at token position 0_. That's not really very >>> helpful for what I'm doing. I need to borrow code from this class and >>> not use it. >>> >>> On Fri, Sep 6, 2013 at 9:10 PM, Benson Margulies <ben...@basistech.com> >>> wrote: >>>> Michael, >>>> >>>> I'm apparently not fully deconfused yet. >>>> >>>> I've got a very simple incrementToken function. It calls peekToken to >>>> stack up the tokens. >>>> >>>> afterPosition is never called; I expected it to be called as each of >>>> the peeked tokens gets next-ed back out. >>>> >>>> I assume that I'm missing something simple. >>>> >>>> >>>> public boolean incrementToken() throws IOException { >>>> if (positions.getMaxPos() < 0) { >>>> peekSentence(); >>>> } >>>> return nextToken(); >>>> } >>>> >>>> >>>> >>>> On Fri, Sep 6, 2013 at 8:13 AM, Benson Margulies <ben...@basistech.com> >>>> wrote: >>>>> On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless >>>>> <luc...@mikemccandless.com> wrote: >>>>>> >>>>>> On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies <ben...@basistech.com> >>>>>> wrote: >>>>>> > I'm trying to work through the logic of reading ahead until I've seen >>>>>> > marker for the end of a sentence, then applying some analysis to all >>>>>> > of the >>>>>> > tokens of the sentence, and then changing some attributes of each >>>>>> > token to >>>>>> > reflect the results. >>>>>> > >>>>>> > The queue of tokens for a position is just a State, so there isn't an >>>>>> > API >>>>>> > there to set any values. >>>>>> > >>>>>> > So do I need to subclass Position for myself, store the additional >>>>>> > information in there, and set the attributes as each token comes by on >>>>>> > the >>>>>> > output side? >>>>>> >>>>>> Yes, that sounds right. Either that or, on emitting the eventual >>>>>> Tokens, apply your logic there (because at that point, after >>>>>> restoreState, you have access to all the attr values for that token). >>>>>> >>>>>> > I would be grateful for a bit more explanation of afterPosition versus >>>>>> > incrementToken; some of the mock classes call peek from afterPosition, >>>>>> > and >>>>>> > I expected to see peek called in incrementToken based on the javadoc. >>>>>> >>>>>> afterPosition is where your subclass can "insert" new tokens. >>>>>> >>>>>> I think (it's been a while here...) you are allowed to call peekToken >>>>>> in afterPosition; this is necessary if your logic about inserting >>>>>> additional tokens leaving a given position depends on future tokens. >>>>>> >>>>>> But: are you doing any new token insertion? Or are you just tweaking >>>>>> the attributes of the tokens that pass through the filter? If it's >>>>>> the latter then this class may be overkill ... you could make a simple >>>>>> TokenFilter.incrementToken that just enumerates & saves all input >>>>>> tokens, does its processing, then returns those tokens one by one, >>>>>> instead. >>>>> >>>>> I'm not adding tokens yet, but I will be soon, so all of this isn't >>>>> entirely crazy. The underlying capability here includes decompounding. >>>>> (I have mixed feelings about just adding all the fragments to the >>>>> token stream, as it can reduce precision, but there isn't an obvious >>>>> alternative (except perhaps to suppress the super-common ones)). >>>>> >>>>> So, to summarize, logic might be: >>>>> >>>>> in incrementToken: >>>>> >>>>> If positions.getMaxPos() > -1. just return nextToken(). If not, loop >>>>> calling peekToken to acquire a sentence, process the sentence, and >>>>> attach the lemmas and compound-pieces to the Position subclass >>>>> objects. >>>>> >>>>> in afterPosition, as each token comes 'into focus', splat the lemma >>>>> from the Position into the char term attribute, and insert new tokens >>>>> as needed for the compound components. >>>>> >>>>> Thanks, >>>>> benson >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> >>>>>> Mike McCandless >>>>>> >>>>>> http://blog.mikemccandless.com >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org