Thanks. I am aware of this thread. Indeed it will change the way TokenStreams are handled, and so copying a Token may not be necessary. However, I can't tell now whether this won't be necessary - I guess I'll just have to wait until it's out and I start using it :-)
Anyway, I've implemented it for myself, and thought this might be a nice contribution. I can live without it in Lucene :-) Thanks Shai On Wed, Nov 12, 2008 at 10:02 PM, Grant Ingersoll <[EMAIL PROTECTED]>wrote: > Are you aware of LUCENE-1422? There is likely going to be a new way of > dealing w/ TokenStreams all together, so you might want to have a look there > before continuing. > > > On Nov 12, 2008, at 1:51 PM, Shai Erera wrote: > > Hi, >> >> I was thinking about adding a copyInto method to Token. The only way to >> clone a token is by using its clone() or clone(char[], int, int, int, int) >> methods. Both do the job, but allocate a Token instance. While in 2.4 a >> Token constructor may actually get a char[] as input (thus saving a char[] >> allocation), but it still allocates an instance. >> >> Even though the instance allocation is not that expensive, it does >> allocate additional things, like String for the type, Payload and String >> (for the text, even though that will be removed in 3.0). >> If an application wishes to keep one instance of Token around, and copy >> into it other Tokens, it can call various methods to achieve that, like >> setTermBuffer, setOffset etc. A copyInto is just a convenient method for >> doing that. >> >> If you wonder about the use case, then here it is: I know that it's >> advised to reuse the same Token instance in the TokenStream API (basically >> make sure to call next(Token). But there might be TokenFilters which will >> need to save a certain occurrance of a token, do some processing and return >> it later. A good example is StemmingFilter. One can think of such a filter >> to return the original token in addition to the stemmed token (for examle, >> for the word "tokens" in English, it will return "tokens" [original] and >> "token" [stem]). In that case, the filter has to save the word "tokens" so >> that it returns "tokens" first (or the stem, the order does not matter) and >> next time its next(Token) is called, it should return the stem (or >> original), before comsuming the next token from the TokenStream. >> >> Anyway, I hope it's clear enough, but if not I can elaborate. >> If you think a copyInto() is worth the effort, I can quickly create a >> patch for it). >> >> Shai >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > >