Thanks Benson, I'll have a look. Mike McCandless
http://blog.mikemccandless.com On Sat, Sep 7, 2013 at 4:33 PM, Benson Margulies <ben...@basistech.com> wrote: > LUCENE-5202. It seems to show the problem of the extra peek. I'm still > struggling to make sense of the 'problem' of not always calling > afterPosition(); that may be entirely my own confusion. > > On Sat, Sep 7, 2013 at 4:21 PM, Michael McCandless > <luc...@mikemccandless.com> wrote: >> That would be awesome, thanks! >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Sat, Sep 7, 2013 at 3:40 PM, Benson Margulies <ben...@basistech.com> >> wrote: >>> I think I had better build you a test case for this situation, and >>> attach it to a JIRA. >>> >>> On Sat, Sep 7, 2013 at 3:33 PM, Michael McCandless >>> <luc...@mikemccandless.com> wrote: >>>> Something is wrong; I'm not sure what offhand, but calling peekToken >>>> 10 times should not stack all tokens @ position 0; it should stack the >>>> tokens at the positions where they occurred. Are you sure the posIncr >>>> att is sometimes 1 (i.e., the position is in fact moving forward for >>>> some tokens)? >>>> >>>> nextToken() only calls peekToken() once the lookahead buffer is exhausted. >>>> >>>> afterPosition() should be called within nextToken(), for each >>>> position, once all tokens leaving that position are done. >>>> >>>> You use case *should* be working: inside your incrementToken() you >>>> call peekToken() over and over until you've seen the full sentence >>>> (saving away any state in your subclass of Position), then nextToken() >>>> to emit the buffered tokens, and to insert your own tokens when >>>> afterPosition() is called ... >>>> >>>> Mike McCandless >>>> >>>> http://blog.mikemccandless.com >>>> >>>> >>>> On Sat, Sep 7, 2013 at 1:10 PM, Benson Margulies <ben...@basistech.com> >>>> wrote: >>>>> nextToken() calls peekToken(). That seems to prevent my lookahead >>>>> processing from seeing that item later. Am I missing something? >>>>> >>>>> >>>>> On Fri, Sep 6, 2013 at 9:15 PM, Benson Margulies <ben...@basistech.com> >>>>> wrote: >>>>>> I think that the penny just dropped, and I should not be using this >>>>>> class. >>>>>> >>>>>> If I call peekToken 10 times while sitting at token 0, this class will >>>>>> stack up all 10 of these _at token position 0_. That's not really very >>>>>> helpful for what I'm doing. I need to borrow code from this class and >>>>>> not use it. >>>>>> >>>>>> On Fri, Sep 6, 2013 at 9:10 PM, Benson Margulies <ben...@basistech.com> >>>>>> wrote: >>>>>>> Michael, >>>>>>> >>>>>>> I'm apparently not fully deconfused yet. >>>>>>> >>>>>>> I've got a very simple incrementToken function. It calls peekToken to >>>>>>> stack up the tokens. >>>>>>> >>>>>>> afterPosition is never called; I expected it to be called as each of >>>>>>> the peeked tokens gets next-ed back out. >>>>>>> >>>>>>> I assume that I'm missing something simple. >>>>>>> >>>>>>> >>>>>>> public boolean incrementToken() throws IOException { >>>>>>> if (positions.getMaxPos() < 0) { >>>>>>> peekSentence(); >>>>>>> } >>>>>>> return nextToken(); >>>>>>> } >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Fri, Sep 6, 2013 at 8:13 AM, Benson Margulies <ben...@basistech.com> >>>>>>> wrote: >>>>>>>> On Fri, Sep 6, 2013 at 7:31 AM, Michael McCandless >>>>>>>> <luc...@mikemccandless.com> wrote: >>>>>>>>> >>>>>>>>> On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies >>>>>>>>> <ben...@basistech.com> wrote: >>>>>>>>> > I'm trying to work through the logic of reading ahead until I've >>>>>>>>> > seen >>>>>>>>> > marker for the end of a sentence, then applying some analysis to >>>>>>>>> > all of the >>>>>>>>> > tokens of the sentence, and then changing some attributes of each >>>>>>>>> > token to >>>>>>>>> > reflect the results. >>>>>>>>> > >>>>>>>>> > The queue of tokens for a position is just a State, so there isn't >>>>>>>>> > an API >>>>>>>>> > there to set any values. >>>>>>>>> > >>>>>>>>> > So do I need to subclass Position for myself, store the additional >>>>>>>>> > information in there, and set the attributes as each token comes by >>>>>>>>> > on the >>>>>>>>> > output side? >>>>>>>>> >>>>>>>>> Yes, that sounds right. Either that or, on emitting the eventual >>>>>>>>> Tokens, apply your logic there (because at that point, after >>>>>>>>> restoreState, you have access to all the attr values for that token). >>>>>>>>> >>>>>>>>> > I would be grateful for a bit more explanation of afterPosition >>>>>>>>> > versus >>>>>>>>> > incrementToken; some of the mock classes call peek from >>>>>>>>> > afterPosition, and >>>>>>>>> > I expected to see peek called in incrementToken based on the >>>>>>>>> > javadoc. >>>>>>>>> >>>>>>>>> afterPosition is where your subclass can "insert" new tokens. >>>>>>>>> >>>>>>>>> I think (it's been a while here...) you are allowed to call peekToken >>>>>>>>> in afterPosition; this is necessary if your logic about inserting >>>>>>>>> additional tokens leaving a given position depends on future tokens. >>>>>>>>> >>>>>>>>> But: are you doing any new token insertion? Or are you just tweaking >>>>>>>>> the attributes of the tokens that pass through the filter? If it's >>>>>>>>> the latter then this class may be overkill ... you could make a simple >>>>>>>>> TokenFilter.incrementToken that just enumerates & saves all input >>>>>>>>> tokens, does its processing, then returns those tokens one by one, >>>>>>>>> instead. >>>>>>>> >>>>>>>> I'm not adding tokens yet, but I will be soon, so all of this isn't >>>>>>>> entirely crazy. The underlying capability here includes decompounding. >>>>>>>> (I have mixed feelings about just adding all the fragments to the >>>>>>>> token stream, as it can reduce precision, but there isn't an obvious >>>>>>>> alternative (except perhaps to suppress the super-common ones)). >>>>>>>> >>>>>>>> So, to summarize, logic might be: >>>>>>>> >>>>>>>> in incrementToken: >>>>>>>> >>>>>>>> If positions.getMaxPos() > -1. just return nextToken(). If not, loop >>>>>>>> calling peekToken to acquire a sentence, process the sentence, and >>>>>>>> attach the lemmas and compound-pieces to the Position subclass >>>>>>>> objects. >>>>>>>> >>>>>>>> in afterPosition, as each token comes 'into focus', splat the lemma >>>>>>>> from the Position into the char term attribute, and insert new tokens >>>>>>>> as needed for the compound components. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> benson >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Mike McCandless >>>>>>>>> >>>>>>>>> http://blog.mikemccandless.com >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>>>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org