--- "Marvin Humphrey (JIRA)" <[EMAIL PROTECTED]> wrote: ... > It also slows Lucene down -- indexing takes around a > 20% speed hit. It would be possible to submit a > patch which had a smaller impact on performance, but > this one is already over 700 lines long, and it's > goal is to achieve standard UTF-8 compliance and > modify the definition of Lucene strings as simply > and reliably as possible. Optimization patches can > now be submitted which build upon this one.
I'm quite sure that the UTF-8 decoding loop can be improved quite a bit after merging in the patch, so eventual performance hit is probably lower (assuming this is a hot spot). Using a tighter inner loop for single-byte values can give a significant boost (up to 50% speedup compared to default UTF-8 decoder jdk 1.5 ships with). In this case, it's probably best to isolate the hot spot (when working on this part, measuring impact of changes), since otherwise it may be hard to measure direct impact. And then measure the total effect when integrating the change. That is to say, I wouldn't worry too much about the initial hit, much/most of it can be optimized away quite soon, just like you suggested. -+ Tatu +- __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]