On Fri, Feb 12, 2016 at 5:32 PM, Igor Tandetnik <igor at tandetnik.org> wrote: > On 2/12/2016 7:24 PM, J Decker wrote: >>
> What character in what ANSI codepage ends up converted by mbstowcs to an > unpaired surrogate? > > What character in what ANSI codepage requires a surrogate pair to represent > (that is, corresponds to a Unicode character outside of BMP), and triggers > failure when passed to mbstowcs? > > With all due respect, I find your claims difficult to believe. > > In any case, MultiByteToWideChar and WideCharToMultiByte are perfectly > capable of converting between UTF-8 and UTF-16. > -- > Igor Tandetnik > Okay; I'd forgotten. It does worse than I expected... //-------------- int main( void ) { char utf8[5] = "\xf0\x90\x80\x81"; char utf82[5] = "\xed\xa0\x81"; char utf8tmp[5]; wchar_t out[5]; wchar_t out2[5]; wchar_t utf16[5] = L"\xd800\xdc01"; wchar_t real_out[25]; char chout[5]; int n; memset( out, 0, sizeof( out ) ); memset( out2, 0, sizeof( out2 ) ); memset( chout, 0, sizeof( chout ) ); mbstowcs( out, utf8, 5 ); mbstowcs( out2, utf82, 5 ); wcstombs( chout, utf16, 5 ); for( n = 0; n < 5; n++ ) printf( "%04x ", out[n] ); // output is 00f0 0090 0080 0081; expect d800 dc01 printf( "\n" ); for( n = 0; n < 5; n++ ) printf( "%04x ", out2[n] ); // output is 00ed 00a0 0081; expect d801 printf( "\n" ); for( n = 0; n < 5; n++ ) printf( "%02x ", chout[n] ); // output is 00 00 00 00 } //-------------- so it does no useful conversion either way :) (but at least I ended up fixing a boundary issue while testing) > > _______________________________________________ > sqlite-users mailing list > sqlite-users at mailinglists.sqlite.org > http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users