On Fri, Feb 12, 2016 at 5:32 PM, Igor Tandetnik <igor at tandetnik.org> wrote:
> On 2/12/2016 7:24 PM, J Decker wrote:
>>

> What character in what ANSI codepage ends up converted by mbstowcs to an
> unpaired surrogate?
>
> What character in what ANSI codepage requires a surrogate pair to represent
> (that is, corresponds to a Unicode character outside of BMP), and triggers
> failure when passed to mbstowcs?
>
> With all due respect, I find your claims difficult to believe.
>
> In any case, MultiByteToWideChar and WideCharToMultiByte are perfectly
> capable of converting between UTF-8 and UTF-16.
> --
> Igor Tandetnik
>

Okay; I'd forgotten.  It does worse than I expected...

//--------------

int main( void )
{
char utf8[5] = "\xf0\x90\x80\x81";
char utf82[5] = "\xed\xa0\x81";
   char utf8tmp[5];
wchar_t out[5];
wchar_t out2[5];
wchar_t utf16[5] = L"\xd800\xdc01";
wchar_t real_out[25];
   char chout[5];
int n;

memset( out, 0, sizeof( out ) );
   memset( out2, 0, sizeof( out2 ) );
   memset( chout, 0, sizeof( chout ) );

mbstowcs( out, utf8, 5 );
mbstowcs( out2, utf82, 5 );
wcstombs( chout, utf16, 5 );

for( n = 0; n < 5; n++ )
printf( "%04x ", out[n] );  // output is 00f0 0090 0080 0081; expect d800 dc01
   printf( "\n" );
for( n = 0; n < 5; n++ )
      printf( "%04x ", out2[n] ); // output is 00ed 00a0 0081; expect d801
   printf( "\n" );
for( n = 0; n < 5; n++ )
      printf( "%02x ", chout[n] );  // output is 00 00 00 00
}

//--------------

so it does no useful conversion either way :)  (but at least I ended
up fixing a boundary issue while testing)

>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users at mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to