Are you aware of LUCENE-1422? There is likely going to be a new way
of dealing w/ TokenStreams all together, so you might want to have a
look there before continuing.
On Nov 12, 2008, at 1:51 PM, Shai Erera wrote:
Hi,
I was thinking about adding a copyInto method to Token. The only way
to clone a token is by using its clone() or clone(char[], int, int,
int, int) methods. Both do the job, but allocate a Token instance.
While in 2.4 a Token constructor may actually get a char[] as input
(thus saving a char[] allocation), but it still allocates an instance.
Even though the instance allocation is not that expensive, it does
allocate additional things, like String for the type, Payload and
String (for the text, even though that will be removed in 3.0).
If an application wishes to keep one instance of Token around, and
copy into it other Tokens, it can call various methods to achieve
that, like setTermBuffer, setOffset etc. A copyInto is just a
convenient method for doing that.
If you wonder about the use case, then here it is: I know that it's
advised to reuse the same Token instance in the TokenStream API
(basically make sure to call next(Token). But there might be
TokenFilters which will need to save a certain occurrance of a
token, do some processing and return it later. A good example is
StemmingFilter. One can think of such a filter to return the
original token in addition to the stemmed token (for examle, for the
word "tokens" in English, it will return "tokens" [original] and
"token" [stem]). In that case, the filter has to save the word
"tokens" so that it returns "tokens" first (or the stem, the order
does not matter) and next time its next(Token) is called, it should
return the stem (or original), before comsuming the next token from
the TokenStream.
Anyway, I hope it's clear enough, but if not I can elaborate.
If you think a copyInto() is worth the effort, I can quickly create
a patch for it).
Shai
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]