Re: [Firebird-devel] Unicode UTF-16 etc

Mark Rotteveel Sat, 31 Aug 2013 01:59:00 -0700

On 29-8-2013 17:41, Jim Starkey wrote:
> Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit
> Unicode.  The reason is simple: There are enough single byte characters
> -- punctuation, control characters, and digits -- stay as single bytes,
> double byte characters are a wash, and the single byte characters
> generally balance the number of three byte characters.
>
> UTF-16 is a mess with nasty problems of endians, multi-word characters,
> and illegal codepoints to worry about.
>
> I bit the bullet long ago with Netfrastructure and NuoDB to support only
> UTF-8 internally.  The servers support collations but not character
> sets, which are are relegated to the client.


Unfortunately the implementation of UTF-8 in Firebird is annoying 
because it reduces that maximum allowed number of characters to a 1/4 of 
that for single byte character sets making it necessary to switch to 
blobs sooner.

I'd prefer to have an option to use UTF-16 (treated as a 2-byte 
character set with surrogate pairs) as that will only halve the maximum 
allowed number of characters.

Mark
-- 
Mark Rotteveel

------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: [Firebird-devel] Unicode UTF-16 etc

Reply via email to