(ping) So, I had another idea. How about a separate file path manipulation class that has a well defined character encoding, so that we can do filename manipulations like with FilePath (and a few more). It could convert from a FilePath if given an encoding, and convert back to a FilePath with the platform's default encoding (using LC_*/LANG on Linux, falling back to ASCII), or a given encoding. It could touch the filesystem so that it could know what ecoding methods and manipulations were valid for the platform/drive combination.
Since it seems like this is not really something that Chromium needs or wants right now (and it doesn't belong in base anyhow because of needing to touch the filesystem), I think I'll work on this for O3D, and later you can see if you want to use it for Chromium. -Greg. On Wed, Apr 29, 2009 at 3:58 PM, Greg Spencer <[email protected]> wrote: > On Wed, Apr 29, 2009 at 12:22 PM, Mark Mentovai <[email protected]> wrote: > >> I understand your problem. You're saying "I have user-supplied data >> that I want to build a filename from," and "I have this pathname that >> I want to display back to the user." I agree that it would be good to > > have a way to handle these cases in base. I don't know if FilePath >> proper is the right place to do it. If we do it in FilePath, it still >> won't really be right. > > > OK, so it sounds like you're telling me not to use FilePath to represent > file paths from a disk for my purposes because they can't ever be converted > reliably to a particular encoding on Linux (which is a requirement for me, > because of the third party libraries that require a particular encoding). > > That's fine, but what do I do instead? Roll my own FilePath clone that has > some encoding assumptions? I can do that, but it has the same issues as the > ones you're worried about with FilePath, so it seems better to solve the > issue in one place rather than have two versions that are both insufficient. > Man, it would be better if FilePath could reliably know its encoding! (I > realize that Linux makes this impossible, it just seems like it would be > better that way. :-) > > Since Linux is the only platform where the encoding is unclear, what if we > did the best we could on Linux: > > When constructing a FilePath from a char* string on Linux: > - Test the input string for values > 127 to determine if it's really just > ASCII (and if so, we're out of the woods). > - Then check LANG, LC_CTYPE, LC_ALL (through appropriate Linux APIs) for an > encoding that we can support, and note the encoding for later if we are > requested to do a conversion. > - If we run into an invalid sequence during a conversion, or an encoding we > can't convert from, then use a CHECK to crash. > > This should work on most filenames, in almost all situations -- I'll bet > most filenames are ASCII, even on foreign systems, and the ones that aren't > ASCII have set LANG to something in /etc/profile, so all filenames created > by any app running on that machine should match that encoding. > > Where they don't do that correctly, they're already getting garbage (and > should expect garbage) from any application they use, not just Chrome, since > there is no way *any *app can decode a path with multiple encodings in it, > or where the encoding is different than LANG (or LC_*) says it is. > > Chrome already crashes like this when it encounters situations where it's > just impossible to know what's right, so it's consistent with Chrome's > behavior in other areas. > > >> it should be the caller's responsibility to only deal with user-created >> names with >> this interface. > > > What do you mean here? Isn't that the case now with FilePath? (It's the > file_util routines that actually read the filesystem and make FilePaths out > of them, afterall). As for your suggestion to only deal with path > components, how would you propose to parse user-supplied paths into one of > these? > > >> > 2) I'd like to make it possible to instantiate a POSIX FilePath object >> on >> > Windows and a Windows FilePath on POSIX platforms. This is because some >> > libraries (e.g. the zip library, or tar files), use POSIX semantics for >> > their paths even on Windows (I haven't seen a use case for Windows paths >> on >> > POSIX yet, actually). This would make it possible to use the nice API >> that >> > FilePath has to manipulate paths appropriately for these other >> libraries. >> > This could be easily accomplished by having POSIX and Windows versions >> of >> > FilePath, and then typedef'ing FilePath differently on different >> platforms >> > to one of these versions. >> >> Sounds pretty Pythonic. >> >> FilePath already sort of has some support for this - it does a bunch >> of things based on feature macros, mostly so that as I was writing it, >> I could test the Windows semantics without having to (shudder) resort >> to running on Windows. These could probably be adapted to do what >> you're asking. > > > Cool. > > >> > 3) It would be helpful to have real path normalization for each of the >> > platforms (although I know what a testing nightmare that can be). I >> might >> > try and tackle this if people think it would be beneficial. >> >> It's also a specification and implementation nightmare. Everyone has >> a different idea of what "normalization" means. What's your idea? > > > Yes, I know it's a nightmare all around, but I think it would be useful to > have something that addresses this. My idea would be the same as Python's > os.path.normpath, mainly because it's a well-tested, seasoned example with > test cases. Windows also has a routine for this (PathCanonicalize) that > could be used (but I know it doesn't work for UNC paths). > > > 4) Make sure we handle case sensitivity vs case preservation correctly. >> > It's unclear to me that FilePath does this correctly on the Mac -- Mac >> file >> > names are case preserving, but case insensitive, Unix filenames are both >> > (and windows filenames are neither :-). >> >> Again with the normalization. What do you want this stuff for? >> What's your idea of how this should work? > > > Probably the same as os.path.normcase in Python. I want this stuff so that > I can make sure that I can at least semi-reliably compare/manipulate > FilePaths to do things like absolute->relative path conversion, or store > FilePaths in a set or map and be sure I don't have multiple entries pointing > to the same file. Without these kinds of operations, doing these things is > pretty much impossible. > > >> Remember: FilePath is specified to be light and to never touch the >> disk. If you've got a disk-touching operation, it probably doesn't >> belong in FilePath proper. > > > I'm OK with that -- it makes sense to keep the file system ops and FilePath > separate. > > -Greg. > --~--~---------~--~----~------------~-------~--~----~ Chromium Developers mailing list: [email protected] View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~----------~----~----~----~------~----~------~--~---
