On Tue, 2005-10-18 at 09:58 +0200, Florian Weimer wrote: > * Jonathan Pryor: > > On Mon, 2005-10-17 at 19:03 +0200, Florian Weimer wrote: > >> Why are UTF-16 strings used in Mono.Unix? Doesn't this mean that some > >> resources are inaccessible to programs running under Mono in a > >> multibyte localeq (such as one using UTF-8)? > > > > Care to elaborate? System.String is always used to represent strings in > > Mono.Unix and Mono.Unix.Native, but Mono's marshaler will convert the > > strings to UTF-8 for the P/Invoke call. > > UNIX systems do not have a system-wide locale. Some user might run > under a single-byte locale and create a file named "Ärger.txt" (whose > name consists of exactly nine bytes in his locale). Another user who > uses UTF-8 cannot access this file using any name that is valid UTF-8. > For applications written in C, this is typically not a problem because > you can pass the necessary byte string on the command line (entering > ?rger.txt in the shell, which performs expansion), but this won't work > with Mono applications.
This won't work with a great deal more than just Mono applications. This will likely also "break" for every app that uses a runtime (Java, Perl, Python), and certainly won't work with GTK+/Gnome applications unless the user explicitly sets the G_FILENAME_ENCODING environment variable to contain the character set name that should instead be used (and how many users will know about G_FILENAME_ENCODING, much less set it?), or the user sets G_BROKEN_FILENAMES=1. A "fix" might be for Mono's string marshaler and Marshal.StringToHGlobalAnsi() to follow G_FILENAME_ENCODING instead of always converting to UTF-8 (something I considered a few months ago but never got around to writing a patch for), in which case things would work properly for you...if you remembered G_FILENAME_ENCODING, anyway. > A first step in a direction to fix that would be to use native strings > (multibyte strings) for accessing native APIs. What does that mean, exactly? Mono is already generating multibyte strings for the Native APIs -- UTF-8 strings, yes, but UTF-8 is a multibyte encoding -- so your statement is effectively meaningless. It sounds like what you *really* want is for Mono's string marshaler to marshal to the user's preferred character set/encoding instead of UTF-8. This can be done, though I'm not sure what all it would impact, and determining what the user's preferred encoding is would likely fall to using G_FILENAME_ENCODING, in which case few may benefit anyway. - Jon _______________________________________________ Mono-list maillist - [email protected] http://lists.ximian.com/mailman/listinfo/mono-list
