Michael McCandless resolved LUCENE-510.

Congratulations.  :)

When I wrote my initial patch, I saw a performance degradation of c. 30% in my indexing benchmarks. Repeated reallocation was presumably one culprit: when length in Java chars is stored in the index, you only need to allocate once, whereas when reading in UTF-8, you can't know just how much memory you need until the read completes. Furthermore, at write-time, you can't look at something composed of 16- bit chars and know what the byte-length of its UTF-8 representation will be without pre-scanning.

How did you solve those problems? Are the string diffs and comparisons now performed against raw bytes, so that fewer conversions are needed?

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to