On 8/30/05, Ken Krugler <[EMAIL PROTECTED]> wrote:
> 
> >Daniel Naber wrote:
> >
> >>On Monday 29 August 2005 19:56, Ken Krugler wrote:
> >>
> >>>"Lucene writes strings as a VInt representing the length of the
> >>>string in Java chars (UTF-16 code units), followed by the character
> >>>data."
> >>>
> >>>
> >>But wouldn't UTF-16 mean 2 bytes per character? That doesn't seem
> >>to be the case.
> >>
> >UTF-16 is a fixed 2 byte/char representation.
> 
> I hate to keep beating this horse, but I want to emphasize that it's
> 2 bytes per Java char (or UTF-16 code unit), not Unicode character
> (code point).


There's more horse beating on Java and Unicode 4 in this blog entry: 
http://weblogs.java.net/blog/joconner/archive/2005/08/how_long_is_you.html.

Reply via email to