1) Does anyone mind? Will it break anything?
2) Are there units tests for this? (particularly PorterStemFilter). The changes are obviously not spectacular, but I prefer not to screw everyone up...
3) I've checked-out the latest version of lucene, is there anything special I need to do if I get the go ahead to check my stuff in (like a dev list review)?
Cheers,
Stephane
Brian Goetz wrote:
I've read rapidly through the analyser's code, but I'm in no way a lucene master. If I understood your statement correctly, you are saying that we would multiply the number of tokens by 1.5 per tokeniser it uses. A potential "optimisation" would be that sometimes the string could be reused since it's immutable as well.
Actually, I was saying that's the absolute worst case. It wouldn't surprise me to see that the actual effect is that it results in only a 10 or 15% increase in object creation during tokenization, not only for the reason you state, but also because there might well be other object creations on a per-token basis that we're not seeing.
Personally, I believe it would be cleaner to make it immutable (I think that's why this thread started), so +1.
Yup.
Immutability -- good. Mutability just to save a few cycles -- bad.
--
Brian Goetz
Quiotix Corporation
[EMAIL PROTECTED] Tel: 650-843-1300 Fax: 650-324-8032
http://www.quiotix.com
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
-- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>