As best I can tell from C# docs a string is a sequence of char, and a char is a 16-bit Unicode character. So strings are in UCS-2 encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via PInvoke.
In truth strings are in UTF-16 in memory, not UCS-2. I tested this myself on .NET more than a year ago, I'll have to test with current mcs/Mono to see if they do handle it properly.
But, yes, the documentation is misleading.
Best regards,
Rafael Teixeira Brazilian Polymath Mono Hacker since 16 Jul 2001
From: Havoc Pennington <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: [Mono-list] string encoding Date: Sun, 22 Jun 2003 00:33:38 -0400
Hi,
As best I can tell from C# docs a string is a sequence of char, and a char is a 16-bit Unicode character. So strings are in UCS-2 encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via PInvoke.
I looked at GTK# for an example. However, GTK# seems to use "string" for the type to pass in and out of GTK, and GTK is wanting UTF-8, not UCS-2.
DllImport has this CharSet parameter that's used to convert native strings to UCS-2, but it doesn't have UTF-8 as a possible value, and anyway GTK# doesn't specify CharSet.
So is GTK# broken, if not why not, if yes how do I do it properly? Basically, how is string encoding handled?
The clean solution to me seems to be that CharSet would contain UTF-8 as a value and CharSet=Auto would imply UTF-8 on UNIX, but I imagine this would be an unacceptable extension of standard APIs.
Havoc _______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail
_______________________________________________ Mono-list maillist - [EMAIL PROTECTED] http://lists.ximian.com/mailman/listinfo/mono-list
