On Tue, Apr 28, 2009 at 2:47 PM, Greg Spencer <[email protected]> wrote:
> On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker <[email protected]>wrote: > >> >> On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer <[email protected]> >> wrote: >> > 1) I'd like to add some explicit routines for converting to/from UTF8 >> and >> > UTF16. While it's nice (and important) that FilePath uses the >> platform's >> > native string, we've found that many third party libraries have made >> other >> > assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t) >> paths >> > regardless of platform, and converting a FilePath to and from those >> forms is >> > a platform-dependent exercise which should be centralized into the class >> > (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit >> > constructors that take each type). >> >> One thing many of us have found, across multiple projects, is that >> wchar_t is fraught with complication as soon as more than one platform >> is involved. "wchar_t == UTF16" is a Windowsism (gcc defaults to 4 >> bytes, for example, and L"mumble" gets stored in UCS-4, not UTF-16). >> Chrome started with more or less what you are suggesting, and we moved >> off of it after much pain. > > > I understand those issues quite well (but I probably should call the > conversion method ToUTF16, now that you mention it). And char* isn't > necessarily UTF8 on all platforms either. > > OK, so what's the currently recommended path for converting to UTF16 or > UTF8 from a FilePath? > That conversion is not defined. If you are on Linux, the contents of the file path is just an array of bytes. It might be UTF-8, in which case you can convert to UTF-16. However, it may also be some crazy encoding or it may not match any encoding. This OS does not require it to match an encoding. When we need to convert a FilePath to Unicode, we use the SysWideToNativeMB and SysNativeMBToWide functions from base. This works by inspecting what the system thinks the current multi-byte encoding is. On Mac that is UTF-8. On Linux, it depends on the value of $LANG. Each time we do such a conversion, we are introducing a potential bug in the product (on Linux at least), so we try hard to avoid them. -Darin --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
