On 29-8-2013 17:41, Jim Starkey wrote: > Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit > Unicode. The reason is simple: There are enough single byte characters > -- punctuation, control characters, and digits -- stay as single bytes, > double byte characters are a wash, and the single byte characters > generally balance the number of three byte characters. > > UTF-16 is a mess with nasty problems of endians, multi-word characters, > and illegal codepoints to worry about. > > I bit the bullet long ago with Netfrastructure and NuoDB to support only > UTF-8 internally. The servers support collations but not character > sets, which are are relegated to the client.
Unfortunately the implementation of UTF-8 in Firebird is annoying because it reduces that maximum allowed number of characters to a 1/4 of that for single byte character sets making it necessary to switch to blobs sooner. I'd prefer to have an option to use UTF-16 (treated as a 2-byte character set with surrogate pairs) as that will only halve the maximum allowed number of characters. Mark -- Mark Rotteveel ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel