I had previously missed the changes to Token that add support for using an array (termBuffer):
+ // For better indexing speed, use termBuffer (and + // termBufferOffset/termBufferLength) instead of termText + // to save new'ing a String per token + char[] termBuffer; + int termBufferOffset; + int termBufferLength; While I think this approach would have been best to start off with rather than String, I'm concerned that it will do little more than add overhead at this point, resulting in slower code, not faster. - If any tokenizer or token filter tries setting the termBuffer, any downstream components would need to check for both. It could be made backward compatible by constructing a string on demand, but that will really slow things down, unless the whole chain is converted to only using the char[] somehow. - It doesn't look like the indexing code currently pays any attention to the char[], right? - What if both the String and char[] are set? A filter that doesn't know better sets the String... this doesn't clear the char[] currently, should it? Thoughts? -Yonik --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]