Jesse Pelton <[EMAIL PROTECTED]> writes:

> That assumes wchar_t holds UTF-16 (as XMLCh does).  It might not.  See
> http://www.losingfight.com/blog/2006/07/28/wchar_t-unsafe-at-any-size/
> for a wchar_t story that would be amusing if it were fiction.

What most people fail to realize is that wchar_t holds whatever you
put into it. If you want portable UTF-16 in wchar_t then put UTF-16
into it, even on platforms where wchar_t is 4-bytes long and can
hold UTF-32.

Alternatively, it is possible to use UTF-16 on platforms where
wchar_t is 2-bytes long and UTF-32 on the rest. The only parts
that will need to know about this arrangement are those that
are responsible with converting to/from wchar_t strings (e.g.,
XMLCh to/from wchar_t). If the application does not need to do
anything special with (e.g., search for) characters that are
outside the BMP (Basic Multilingual Plane), then it can use
wchar_t that contains either UTF-16 or UTF-32 without actually
caring which one it is. And I am pretty sure this is 99.9% of
applications. We use this approach in our XML data binding tool
when the user requests the underlying character type to be wchar_t.


Boris


-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

Reply via email to