Re: [jira] Resolved: (LUCENE-510) IndexOutput.writeString() should write length in bytes

Marvin Humphrey Wed, 26 Mar 2008 13:57:03 -0700

Michael McCandless resolved LUCENE-510.


Congratulations.  :)

When I wrote my initial patch, I saw a performance degradation of c.30% in my indexing benchmarks. Repeated reallocation was presumablyone culprit: when length in Java chars is stored in the index, youonly need to allocate once, whereas when reading in UTF-8, you can'tknow just how much memory you need until the read completes.Furthermore, at write-time, you can't look at something composed of 16-bit chars and know what the byte-length of its UTF-8 representationwill be without pre-scanning.

How did you solve those problems? Are the string diffs andcomparisons now performed against raw bytes, so that fewer conversionsare needed?


Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [jira] Resolved: (LUCENE-510) IndexOutput.writeString() should write length in bytes

Reply via email to