As best I can tell from C# docs a string is a sequence of char, and a
char is a 16-bit Unicode character. So strings are in UCS-2
encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via
PInvoke.

In truth strings are in UTF-16 in memory, not UCS-2. I tested this myself on .NET more than a year ago, I'll have to test with current mcs/Mono to see if they do handle it properly.


But, yes, the documentation is misleading.

Best regards,

Rafael Teixeira
Brazilian Polymath
Mono Hacker since 16 Jul 2001



From: Havoc Pennington <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: [Mono-list] string encoding
Date: Sun, 22 Jun 2003 00:33:38 -0400

Hi,

As best I can tell from C# docs a string is a sequence of char, and a
char is a 16-bit Unicode character. So strings are in UCS-2
encoding. Trying to figure out then how to marshal/unmarshal UTF-8 via
PInvoke.

I looked at GTK# for an example.  However, GTK# seems to use "string"
for the type to pass in and out of GTK, and GTK is wanting UTF-8, not
UCS-2.

DllImport has this CharSet parameter that's used to convert native
strings to UCS-2, but it doesn't have UTF-8 as a possible value, and
anyway GTK# doesn't specify CharSet.

So is GTK# broken, if not why not, if yes how do I do it properly?
Basically, how is string encoding handled?

The clean solution to me seems to be that CharSet would contain UTF-8
as a value and CharSet=Auto would imply UTF-8 on UNIX, but I imagine
this would be an unacceptable extension of standard APIs.

Havoc
_______________________________________________
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list

_________________________________________________________________
The new MSN 8: smart spam protection and 2 months FREE* http://join.msn.com/?page=features/junkmail


_______________________________________________
Mono-list maillist  -  [EMAIL PROTECTED]
http://lists.ximian.com/mailman/listinfo/mono-list

Reply via email to