That method should easily be changed to public final String readString() throws IOException { int length = readVInt(); return new String(readBytes(length),"UTF-8); }
readBytes(0 could reuse the same array if it was large enough. Then only the single char[] is created in the String code. -----Original Message----- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 30, 2005 11:28 AM To: java-dev@lucene.apache.org Subject: Re: Lucene does NOT use UTF-8. > How will the difference impact String memory allocations? Looking at the > String code, I can't see where it would make an impact. This is from Lucene InputStream: public final String readString() throws IOException { int length = readVInt(); if (chars == null || length > chars.length) chars = new char[length]; readChars(chars, 0, length); return new String(chars, 0, length); } If you know the length in bytes, you still have to allocate that many chars (even though the number of chars may be less than the number of bytes). Not a big deal IMHO. A bigger pain is on the writing side, where you can't stream things because you don't know what the length is going to be (in either bytes *or* UTF-8 chars). So it turns out that Java's 16 bit chars were just a waste... it's still a multibyte format *and* it takes up more space. UTF-8 would have been nice - no conversions necessary. -Yonik Now hiring -- http://tinyurl.com/7m67g --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]