On 2013-06-06 20:25:58 +0000, Walter Bright <[email protected]> said:
On 6/6/2013 1:02 PM, Michel Fortin wrote:
Have you never opened a local file in a windows web browser and took a look at
the URL? The drive letter is there.
file:///c:/path/to/the%20file.txt
The drive letter is simply the first part of the path on Windows.
I didn't know that, but that doesn't make it a canonical path. It just
combines the notion of url with a path.
It's not a canonical path, but it's a platform-neutral representation
of a path. You can perform the same operations with a URL (including
regular expressions) irrespective the underlying OS.
I was replying initially to your claim that there was no portable way
to represent a path. I don't think the definition of a "portable path"
needs to include any notion of canonical, because not even non-portable
paths can be canonical these days.
Actually, it doesn't depend on Linux or Windows or OS X. It depends on the
filesystem used, be it FAT16, FAT32, NTFS, ext{1,2,3}, HFS+, Case-sensitive
HFS+, etc. If you assume a specific case sensitivity setting by looking at the
OS, that's a bug. You can mount NTFS and FAT on Linux or OS X, and Apple has
Case-sensitive HFS+ for OS X and its the default on iOS. Then there's the whole
issue about which locale to use for Unicode case-insensitive comparisons. I'd
bet that different filesystems choose different approaches to this
tricky problem.
So there's no way to normalize for case-sensitivity just by looking at
a path or
a URL, even if you know on which OS you're on. If you want to know for sure
whether two paths are the same, or what is the normalized path, you need to ask
the filesystem at some point. Anything else is based on fragile assumptions.
It may be a bug, and I personally try to never depend on path code that
is case sensitive or not, but I bet there's a *lot* of code out there
that makes those assumptions.
That's a good way to deal with paths (don't assume anything). And I'd
bet even case-sensitive filesystems differ in behaviour when presented
with different normalization of Unicode (using pre-combined characters
vs. combining ones).
--
Michel Fortin
[email protected]
http://michelf.ca/