Re: Add Token.copyInto(Token) API

Grant Ingersoll Thu, 13 Nov 2008 07:49:13 -0800

Don't want to discourage you from contributing it, just suggesting youmay want to make sure the people working on that patch are aware ofthe issue such that maybe it can be addressed.


On Nov 13, 2008, at 10:14 AM, Shai Erera wrote:

Thanks. I am aware of this thread. Indeed it will change the wayTokenStreams are handled, and so copying a Token may not benecessary. However, I can't tell now whether this won't be necessary- I guess I'll just have to wait until it's out and I start usingit :-)
Anyway, I've implemented it for myself, and thought this might be anice contribution. I can live without it in Lucene :-)
Thanks
Shai
On Wed, Nov 12, 2008 at 10:02 PM, Grant Ingersoll<[EMAIL PROTECTED]> wrote:Are you aware of LUCENE-1422? There is likely going to be a new wayof dealing w/ TokenStreams all together, so you might want to have alook there before continuing.
On Nov 12, 2008, at 1:51 PM, Shai Erera wrote:

Hi,
I was thinking about adding a copyInto method to Token. The only wayto clone a token is by using its clone() or clone(char[], int, int,int, int) methods. Both do the job, but allocate a Token instance.While in 2.4 a Token constructor may actually get a char[] as input(thus saving a char[] allocation), but it still allocates an instance.
Even though the instance allocation is not that expensive, it doesallocate additional things, like String for the type, Payload andString (for the text, even though that will be removed in 3.0).If an application wishes to keep one instance of Token around, andcopy into it other Tokens, it can call various methods to achievethat, like setTermBuffer, setOffset etc. A copyInto is just aconvenient method for doing that.
If you wonder about the use case, then here it is: I know that it'sadvised to reuse the same Token instance in the TokenStream API(basically make sure to call next(Token). But there might beTokenFilters which will need to save a certain occurrance of atoken, do some processing and return it later. A good example isStemmingFilter. One can think of such a filter to return theoriginal token in addition to the stemmed token (for examle, for theword "tokens" in English, it will return "tokens" [original] and"token" [stem]). In that case, the filter has to save the word"tokens" so that it returns "tokens" first (or the stem, the orderdoes not matter) and next time its next(Token) is called, it shouldreturn the stem (or original), before comsuming the next token fromthe TokenStream.
Anyway, I hope it's clear enough, but if not I can elaborate.
If you think a copyInto() is worth the effort, I can quickly createa patch for it).
Shai



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Add Token.copyInto(Token) API

Reply via email to