Paradoxically, Japanese strings tend to be shorter in UTF-8 than 16 bit Unicode. The reason is simple: There are enough single byte characters -- punctuation, control characters, and digits -- stay as single bytes, double byte characters are a wash, and the single byte characters generally balance the number of three byte characters.
UTF-16 is a mess with nasty problems of endians, multi-word characters, and illegal codepoints to worry about. I bit the bullet long ago with Netfrastructure and NuoDB to support only UTF-8 internally. The servers support collations but not character sets, which are are relegated to the client. On 8/9/2013 8:28 PM, Paul Vinkenoog wrote: > Adriano wrote: > >>> Looking in the source of intl_builtin.cpp I noticed that there is >>> support for UTF16, UTF32 and UNICODE_UCS2, for UNICODE_UCS2 there is >>> also a constant (=8) defined in charsets.h >>> >>> These definitions are missing from RDB$CHARACTER_SETS. Can these be used >>> as a connection or column character set? If not, what are they for? >> These are for internal usage only. >> >> I doubt someone can make UTF16/32 works as connection charset, it's too >> much work. >> >> For columns, with some work may be possible. But why? UTF-8 uses 1-4 >> bytes per char, UTF-16 is also multibyte, using 2-4, and UTF-32 always 4 >> bytes per char. >> >> I don't see how they might be preferred over UTF-8. > UTF-16 is much preferred for e.g. Far Eastern languages, because it > will use 2 bytes for every code point, whereas UTF-8 needs at least 3. > > I can imagine that Japanese Firebird users would appreciate UTF-16 > support. > > > Paul Vinkenoog > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > Firebird-Devel mailing list, web interface at > https://lists.sourceforge.net/lists/listinfo/firebird-devel ------------------------------------------------------------------------------ Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! Discover the easy way to master current and previous Microsoft technologies and advance your career. Get an incredible 1,500+ hours of step-by-step tutorial videos with LearnDevNow. Subscribe today and save! http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel