On 7/19/07, Michael McCandless <[EMAIL PROTECTED]> wrote:
"Yonik Seeley" <[EMAIL PROTECTED]> wrote:
> I had previously missed the changes to Token that add support for
> using an array (termBuffer):
>
> + // For better indexing speed, use termBuffer (and
> + // termBufferOffset/termBufferLength) instead of termText
> + // to save new'ing a String per token
> + char[] termBuffer;
> + int termBufferOffset;
> + int termBufferLength;
>
> While I think this approach would have been best to start off with
> rather than String,
> I'm concerned that it will do little more than add overhead at this
> point, resulting in slower code, not faster.
>
> - If any tokenizer or token filter tries setting the termBuffer, any
> downstream components would need to check for both. It could be made
> backward compatible by constructing a string on demand, but that will
> really slow things down, unless the whole chain is converted to only
> using the char[] somehow.
Good point: if your analyzer/tokenizer produces char[] tokens then
your downstream filters would have to accept char[] tokens.
I think on-demand constructing a String (and saving it as termText)
would be an OK solution? Why would that be slower than having to make
a String in the first place (if we didn't have the char[] API)? It's
at least graceful degradation.
It's the rule rather than the exception though. Pretty much
everything is based on String.
> - It doesn't look like the indexing code currently pays any attention
> to the char[], right?
It does, in DocumentsWriter.addPosition().
Ah, thanks.
> - What if both the String and char[] are set? A filter that doesn't
> know better sets the String... this doesn't clear the char[]
> currently, should it?
Currently the char[] wins, but good point: seems like each setter
should null out the other one?
Certainly the String setter should null the char[] (that's the only
way to keep back compatibility), and probably vice-versa.
Note that there are many existing filters that directly access and
manipulate the package protected String termText. These will need to
be changed.
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]