https://bugzilla.novell.com/show_bug.cgi?id=464128
https://bugzilla.novell.com/show_bug.cgi?id=464128#c5 Kornél Pál <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEEDINFO |NEW InfoProvider|[email protected] | --- Comment #5 from Kornél Pál <[email protected]> 2010-10-14 10:38:59 UTC --- Created an attachment (id=394922) --> (http://bugzilla.novell.com/attachment.cgi?id=394922) Utf8AnsiConflictTest.cs Although Jon is on the right track the current bug report is refers to native C code rather than marshaling in managed code. (For Jon's example the solution I think is to set UnixEncoding.Instance to Encoding.Default on Windows.) A very important difference between Linux (and Unix) and Windows is that Linux is using char* to represent strings while Windows is using wchar_t*. Windows interprets wchar_t* as being UTF-16 (there was no UTF-16 when Windows 2000 was released, it uses UCS-2). char* on Linux may vary by system but most recent distros and installations use UTF-8. (File names for example may use different encodings that may cause problems but that's another story.) Windows has a system setting referred to as the ANSI code page that specifies what charset char* is in encoded. Important to note that the ANSI code page is never UTF-8, it always is a legacy non-standard MS code page, like Windows 1252. (TextInfo.ANSICodePage has a nice DB of ANSI code pages of locales.) Furthermore there is nothing in char* (except content of text files) on Windows. When you call an API that takes char*, it gets converted to wchar_t* using ANSI to UTF-16. Even file names are stored in Unicode on NTFS and vfat. Mono (native C parts) mostly is using char* that contains UTF-8 that is a very good and protable design. The only problem is that sometimes it calls C runtime functions. char* is the same but Mono passes UTF-8 that the C runtime interprets as being in ANSI and converts to UTF-16. As long as you use ASCII you will not notice this problem since ANSI code pages as well as UTF-8 are usually ASCII compatible so the result is the same. If you however use non-ASCII characters conversion will corrupt strings for sure. This even may lead to security problems although I am not aware of any specific security issue. The attached Utf8AnsiConflictTest.cs shows that external resource file hash is generated incorrectly by SRE of Mono on Windows because of encoding mismatch. The same test works fine on Windows. Note that this particular bug is in mono_sha1_get_digest_from_file. fopen is called that expects ANSI and UTF-8 is passed. Because of another bug, not exception is generated, the error is simply ignored and invalid hash is written to the module. This is a general problem (although most likely not a critical one) that is not specific to fopen or SRE either. The solution is not to call any Windows API or CRT function that takes char*. Instead UTF-8 should be converted to UTF-16 and Windows API and CRT functions that take wchar_t* should be called. -- Configure bugmail: https://bugzilla.novell.com/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug. You are the assignee for the bug. _______________________________________________ mono-bugs maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-bugs
