On Tue, Apr 28, 2009 at 2:47 PM, Greg Spencer <[email protected]> wrote:

> On Tue, Apr 28, 2009 at 2:41 PM, Amanda Walker <[email protected]>wrote:
>
>>
>> On Tue, Apr 28, 2009 at 4:39 PM, Greg Spencer <[email protected]>
>> wrote:
>> > 1) I'd like to add some explicit routines for converting to/from UTF8
>> and
>> > UTF16.  While it's nice (and important) that FilePath uses the
>> platform's
>> > native string, we've found that many third party libraries have made
>> other
>> > assumptions, where they always expect UTF8 (char) or UTF16 (wchar_t)
>> paths
>> > regardless of platform, and converting a FilePath to and from those
>> forms is
>> > a platform-dependent exercise which should be centralized into the class
>> > (i.e. adding "ToUTF8" and "ToWide" functions to the class, and explicit
>> > constructors that take each type).
>>
>> One thing many of us have found, across multiple projects, is that
>> wchar_t is fraught with complication as soon as more than one platform
>> is involved. "wchar_t == UTF16" is a Windowsism (gcc defaults to 4
>> bytes, for example, and L"mumble" gets stored in UCS-4, not UTF-16).
>> Chrome started with more or less what you are suggesting, and we moved
>> off of it after much pain.
>
>
> I understand those issues quite well (but I probably should call the
> conversion method ToUTF16, now that you mention it).  And char* isn't
> necessarily UTF8 on all platforms either.
>
> OK, so what's the currently recommended path for converting to UTF16 or
> UTF8 from a FilePath?
>


That conversion is not defined.  If you are on Linux, the contents of the
file path is just an array of bytes.  It might be UTF-8, in which case you
can convert to UTF-16.  However, it may also be some crazy encoding or it
may not match any encoding.  This OS does not require it to match an
encoding.

When we need to convert a FilePath to Unicode, we use the SysWideToNativeMB
and SysNativeMBToWide functions from base.  This works by inspecting what
the system thinks the current multi-byte encoding is.  On Mac that is UTF-8.
 On Linux, it depends on the value of $LANG.  Each time we do such a
conversion, we are introducing a potential bug in the product (on Linux at
least), so we try hard to avoid them.

-Darin

--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: [email protected] 
View archives, change email options, or unsubscribe: 
    http://groups.google.com/group/chromium-dev
-~----------~----~----~----~------~----~------~--~---

Reply via email to