Michael McCandless resolved LUCENE-510.
Congratulations. :)
When I wrote my initial patch, I saw a performance degradation of c.
30% in my indexing benchmarks. Repeated reallocation was presumably
one culprit: when length in Java chars is stored in the index, you
only need to allocate once, whereas when reading in UTF-8, you can't
know just how much memory you need until the read completes.
Furthermore, at write-time, you can't look at something composed of 16-
bit chars and know what the byte-length of its UTF-8 representation
will be without pre-scanning.
How did you solve those problems? Are the string diffs and
comparisons now performed against raw bytes, so that fewer conversions
are needed?
Marvin Humphrey
Rectangular Research
http://www.rectangular.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]