Re: [whatwg] URL: file: URLs
On 10/31/12 7:38 AM, Benjamin Smedberg wrote: On 10/30/2012 7:41 PM, João Eiras wrote: I currently do not have Windows to test but I think I recall IE (or Opera?) opening file://server/share if there was a network share at \\server\share Firefox has considered and rejected that kind of proposal for security reasons. I can't find the bug right now, but I suspect that we would not implement that feature even if it were specced. On Windows Firefox file:server/path works, though. After the initial two slashes are satisfied we take //server/path as the path and hand it off to the OS which happily treats that as a UNC path. -Dan Veditz
Re: [whatwg] URL: file: URLs
On Tue, Oct 30, 2012 at 10:46 PM, Simon Pieters sim...@opera.com wrote: My knee-jerk reaction is the same as Anne's; why not do this for all platforms? I now made it so that for URL's whose scheme is file, [a-Z] followed by either : or | as first path segment becomes [a-Z] followed by :. I also made it so that for URL's whose scheme is file, [a-Z] followed by either : or | as host get an empty host and use that as first path segment instead (applying the rules before, ending up with | converted to :). http://url.spec.whatwg.org/ (see file host state and relative path state, or search for file throughout) Cheers, -- http://annevankesteren.nl/
Re: [whatwg] URL: file: URLs
João Eiras, 2012-10-31 01:41 (Europe/Helsinki): In both Firefox and Chrome if you type file://aaa/some/path, or file://localhost/some/path, the aaa and localhost parts are ignored, and the rest of the path is interpreted as a local file path. In Opera, anything that is not localhost gives an error. How about following: (1) file://c:/foo tries to connect to server c: and request shared entity foo. (2) file://foo/bar tries to connect to server foo and request shared entity bar (3) file:///c:/foo tries to refer to localhost path /c:/foo which in windows environment would be interpreted as local C:\foo, POSIX-compatible systems would try literal /c:/foo (colon is a valid character in the path name). (4) file://localhost/c:/foo is identical to (3) above. I understand that (1) would behave different from some legacy user agents but there really is not interoperability with such file URLs so I guess that does not matter too much. Some legacy user agents also support URLs such as (5) file:///c|/foo which is considered equal to (3). I have no idea why the pipe is considered better character than colon here. Some problematic URLs are still possible: (6) file:///foo/bar should refer to entity /foo/bar in POSIX-compatible systems but I have no idea where it would map to with windows-style drive letter naming at the start of the local path. -- Mikko
Re: [whatwg] URL: file: URLs
On Wed, Oct 31, 2012 at 9:27 AM, Mikko Rantalainen mikko.rantalai...@peda.net wrote: João Eiras, 2012-10-31 01:41 (Europe/Helsinki): In both Firefox and Chrome if you type file://aaa/some/path, or file://localhost/some/path, the aaa and localhost parts are ignored, and the rest of the path is interpreted as a local file path. In Opera, anything that is not localhost gives an error. How about following: (1) file://c:/foo tries to connect to server c: and request shared entity foo. I think bz made it pretty clear we need to treat as if you typed file:///c:/foo (at least on Windows, my preference is all platforms). Not sure what the rules are exactly, but I believe they are if you have a single ASCII letter followed by : or |. (2) file://foo/bar tries to connect to server foo and request shared entity bar I think we should stick for now to how it should be parsed. Interpretation is a different layer. So this would give host foo and path /bar. E.g. on Mac it might well end up meaning that host does not matter and localhost is always used. I think we should not let that affect parsing or serialization however, because then we end up with platform-specific rules. Some problematic URLs are still possible: (6) file:///foo/bar should refer to entity /foo/bar in POSIX-compatible systems but I have no idea where it would map to with windows-style drive letter naming at the start of the local path. It would probably not map to anything, but that's fine. Again, we want to treat parsing of file: URLs distinct from interpretation of file URLs. -- http://annevankesteren.nl/
Re: [whatwg] URL: file: URLs
On 10/31/12 4:27 AM, Mikko Rantalainen wrote: (1) file://c:/foo tries to connect to server c: and request shared entity foo. I don't think that's really acceptable, but see below. I understand that (1) would behave different from some legacy user agents Well, it would behave differently from Gecko, for sure. Have you tested any other UAs? As in, do you have any evidence that some is not all in that sentence? Also, how are you defining legacy? Is it the same as existing? ;) but there really is not interoperability with such file URLs Is that a guess, or do you have data that you're forgetting to present? Serious question Some legacy user agents also support URLs such as (5) file:///c|/foo which is considered equal to (3). I have no idea why the pipe is considered better character than colon here. Such URIs were used commonly back when various things that processed URIs went into conniptions when they saw the ':' reserved character, iirc. In fact, at one point there were UAs that supported the version with '|' but not with ':'. And hence there was content that used that syntax. And again, what's needed here is data on which UAs do what, not generic statements about some. -Boris
Re: [whatwg] URL: file: URLs
On 10/31/12 9:52 AM, Anne van Kesteren wrote: I think bz made it pretty clear we need to treat as if you typed file:///c:/foo (at least on Windows, my preference is all platforms). Not sure what the rules are exactly, but I believe they are if you have a single ASCII letter followed by : or |. That's correct for Gecko. Specifically, what Gecko looks for is a URI that matches this regexp, effectively: ^file://[a-zA-Z][:|][/\\]? So file://z: and file://z|/ and file://z:\ would all be treated as having no authority and the path starting with the z in Gecko. Again, I would love info on other UAs. -Boris
Re: [whatwg] URL: file: URLs
On Wed, 31 Oct 2012 15:38:36 +0100, Benjamin Smedberg benja...@smedbergs.us wrote: On 10/30/2012 7:41 PM, João Eiras wrote: I currently do not have Windows to test but I think I recall IE (or Opera?) opening file://server/share if there was a network share at \\server\share Firefox has considered and rejected that kind of proposal for security reasons. I can't find the bug right now, but I suspect that we would not implement that feature even if it were specced. Obviously supporting such thing would require implementing security checks between origins, and the origin generation algorithm updated to allow file://host, while currently it specifies that all file: uris have a null origin. And, FF is no the only program that can interpret file uris. Any application should be able to open file uris, just like they can open local file paths now, or http uris.
Re: [whatwg] URL: file: URLs
On Wed, Oct 31, 2012 at 3:38 PM, Benjamin Smedberg benja...@smedbergs.us wrote: I currently do not have Windows to test but I think I recall IE (or Opera?) opening file://server/share if there was a network share at \\server\share Firefox has considered and rejected that kind of proposal for security reasons. I can't find the bug right now, but I suspect that we would not implement that feature even if it were specced. Just to be clear, the proposal does not require you to implement that feature (that would be processing of a parsed file: URL, which is undefined), but it does require you to preserve the host name (which you are free to ignore or treat as a network error). -- http://annevankesteren.nl/
Re: [whatwg] URL: file: URLs
On Mon, Oct 29, 2012 at 4:24 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 10/29/12 10:53 AM, Anne van Kesteren wrote: But at that point in a URL you cannot have a path. A path starts with a slash after the host. The point is that on Windows, Gecko parses file://c:/something as file:///c:/something As in, it's an exception to the general if there are two slashes after the file: then the next thing is a host rule. Thanks, I missed that. It seems however we could have that parsing rule for all platforms without issue, no? After all, file://c:/ does not parse currently on non-Windows platforms. I suppose, I would hate it though for new URL(...) to depend on the platform. I'm not sure there are great solutions here. :( Yeah, I'm willing to suck it up, but I would like to explore our options before we go that route. -- http://annevankesteren.nl/
Re: [whatwg] URL: file: URLs
On 10/30/12 12:25 PM, Anne van Kesteren wrote: Thanks, I missed that. It seems however we could have that parsing rule for all platforms without issue, no? Hmm. Possibly, yes. I'd love feedback from other UAs here! -Boris
Re: [whatwg] URL: file: URLs
On Tue, 30 Oct 2012 18:38:46 +0200, Boris Zbarsky bzbar...@mit.edu wrote: On 10/30/12 12:25 PM, Anne van Kesteren wrote: Thanks, I missed that. It seems however we could have that parsing rule for all platforms without issue, no? Hmm. Possibly, yes. I'd love feedback from other UAs here! My knee-jerk reaction is the same as Anne's; why not do this for all platforms? -- Simon Pieters Opera Software
Re: [whatwg] URL: file: URLs
On Tue, 30 Oct 2012 16:25:30 -, Anne van Kesteren ann...@annevk.nl wrote: On Mon, Oct 29, 2012 at 4:24 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 10/29/12 10:53 AM, Anne van Kesteren wrote: But at that point in a URL you cannot have a path. A path starts with a slash after the host. The point is that on Windows, Gecko parses file://c:/something as file:///c:/something As in, it's an exception to the general if there are two slashes after the file: then the next thing is a host rule. Thanks, I missed that. It seems however we could have that parsing rule for all platforms without issue, no? After all, file://c:/ does not parse currently on non-Windows platforms. I suppose, I would hate it though for new URL(...) to depend on the platform. I'm not sure there are great solutions here. :( Yeah, I'm willing to suck it up, but I would like to explore our options before we go that route. In both Firefox and Chrome if you type file://aaa/some/path, or file://localhost/some/path, the aaa and localhost parts are ignored, and the rest of the path is interpreted as a local file path. In Opera, anything that is not localhost gives an error. I currently do not have Windows to test but I think I recall IE (or Opera?) opening file://server/share if there was a network share at \\server\share In a previous job I had, where the environment was a bit windows centric, there was a wiki with documentation with links to files on network shares. I recall the urls looked something like file:\\server\some\path in the HTML. IE opened the files (hence people continued to write them). The other browsers didn't. The point is that the file uri can and should have the authority part, or host, and that can be the local machine, or a network share.
Re: [whatwg] URL: file: URLs
On Sun, Oct 28, 2012 at 6:51 PM, Boris Zbarsky bzbar...@mit.edu wrote: Same as the comment I quoted? As same as something else? Same as you quoted. Well, the Gecko parser preserves the host at this stage assuming the URI was correctly formatted with a host. Again: blah://foo/bar = blah://foo/bar The interesting things happen when you have 0, 1, or 3 slashes between ':' and foo. The handling of foo after this point is a separate issue. Those are handled the same as in Gecko (also matches Safari I think, Chrome strips are starting slashes (like if you have four), but I did not copy that). In Gecko, it's part of URL parsing. More precisely, it's part of the normalization performed as part of constructing a URL object from a string. Since this is also how we parse URLs, it's effectively all part of the package. But note that it would be a bit odd of file://c:/ claimed to have a host of c with a default port or some such... Maybe I should introduce a file host state that supports colons in the host name (or special case the host state further, but the former seems cleaner). Most browsers seem to fail currently on input such as file://c:/ but this is on a Mac so maybe that's the difference. I would prefer having the parsing be consistent though. 7 and 8 are not, though at some point we'll need to define equality comparisons anyway. Yeah, I guess at some point someone would need to write a processing file: URLs specification (for post-parsing operations). On the other hand, it's not entirely clear to me that needs to be interoperable. -- http://annevankesteren.nl/
Re: [whatwg] URL: file: URLs
On 10/29/12 5:00 AM, Anne van Kesteren wrote: But note that it would be a bit odd of file://c:/ claimed to have a host of c with a default port or some such... Maybe I should introduce a file host state that supports colons in the host name (or special case the host state further, but the former seems cleaner). I don't think that's particularly desirable. The c: is totally part of the path; treating it otherwise would just be confusing. Imo. Most browsers seem to fail currently on input such as file://c:/ but this is on a Mac Yes, doing that on a Mac would just be wrong I would prefer having the parsing be consistent though. You mean across Windows and non-Windows? I'm not sure that's viable. -Boris
Re: [whatwg] URL: file: URLs
On Mon, Oct 29, 2012 at 3:13 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 10/29/12 5:00 AM, Anne van Kesteren wrote: Maybe I should introduce a file host state that supports colons in the host name (or special case the host state further, but the former seems cleaner). I don't think that's particularly desirable. The c: is totally part of the path; treating it otherwise would just be confusing. Imo. But at that point in a URL you cannot have a path. A path starts with a slash after the host. Especially if you want file://test/ to parse with test being the host. Most browsers seem to fail currently on input such as file://c:/ but this is on a Mac Yes, doing that on a Mac would just be wrong I suppose, I would hate it though for new URL(...) to depend on the platform. -- http://annevankesteren.nl/
Re: [whatwg] URL: file: URLs
On 10/29/12 10:53 AM, Anne van Kesteren wrote: But at that point in a URL you cannot have a path. A path starts with a slash after the host. The point is that on Windows, Gecko parses file://c:/something as file:///c:/something As in, it's an exception to the general if there are two slashes after the file: then the next thing is a host rule. I suppose, I would hate it though for new URL(...) to depend on the platform. I'm not sure there are great solutions here. :( -Boris
Re: [whatwg] URL: file: URLs
On 10/27/12 3:35 PM, Anne van Kesteren wrote: This is covered as we do this for all URLs currently with a relative scheme (http/ws/...). I know you indicated this as potentially problematic Let's have that fight separately. ;) 2) file:// URIs are parsed as a no authority URL in Gecko. Quoting the IDL comment: ... The parser in the specification should handle these in the same way. Same as the comment I quoted? As same as something else? I have not introduced a no authority concept however. The parser in the specification also preserves the host as other user agents seem to preserve it. Well, the Gecko parser preserves the host at this stage assuming the URI was correctly formatted with a host. Again: blah://foo/bar = blah://foo/bar The interesting things happen when you have 0, 1, or 3 slashes between ':' and foo. The handling of foo after this point is a separate issue. 4) For no authority URLs, including file://, on Windows and OS/2 only, if what looks like authority section looks like a drive letter, it's treated as part of the path. For example, file://c:/ is treated as the filename c:\. Looks like a drive letter is defined as ASCII letter (any case), followed by a ':' or '|' and then followed by end of string or '/' or '\\'. I'm not sure why this is checking for '\\' again, honestly. ;) Is this part of URL parsing or part of doing something with the resulting URL? In Gecko, it's part of URL parsing. More precisely, it's part of the normalization performed as part of constructing a URL object from a string. Since this is also how we parse URLs, it's effectively all part of the package. But note that it would be a bit odd of file://c:/ claimed to have a host of c with a default port or some such... 5) When parsing a no authority URL (including file://), and when item 4 above does not apply, it looks like Gecko skips everything after file:// up until the next '/', '?', or '#' char before parsing path stuff. So the host is dropped? In Gecko, I believe so, yes. I'm not saying this is desirable; just what Gecko does. 6) On Windows and OS/2, when dynamically parsing a path for a no authority URL (not sure whether this is actually web-exposed, fwiw...) Gecko will do something involving looking for a path that's only an ASCII letter followed by ':' or '|' followed by end of string. ... 7) When doing URI equality comparisons ... 8) When actually resolving a file:// URL These points do not seem to be about parsing, correct? Well, point 6 is about parsing, sort of. 7 and 8 are not, though at some point we'll need to define equality comparisons anyway. -Boris
[whatwg] URL: file: URLs
On Mon, Sep 24, 2012 at 4:06 PM, Boris Zbarsky bzbar...@mit.edu wrote: Hmm. So here goes at least a partial list: 1) On Windows and OS/2, Gecko replaces '\\' with '/' in file:// URI strings before doing anything else with the string when parsing a new URL. That includes relative URI strings being resolved against a file:// base. This is covered as we do this for all URLs currently with a relative scheme (http/ws/...). I know you indicated this as potentially problematic, but note that a) \ as a raw code point is invalid in a URL and b) because of a) you can represent it as %5C, and c) other user agents have hit issues with not supporting \ and / outside of file: URLs. 2) file:// URIs are parsed as a no authority URL in Gecko. Quoting the IDL comment: 35 /** 36 * blah:foo/bar= blah:///foo/bar 37 * blah:/foo/bar = blah:///foo/bar 38 * blah://foo/bar = blah://foo/bar 39 * blah:///foo/bar = blah:///foo/bar 40 */ where the thing on the left is the input string and the thing on the right is the normalized form that the parser produces from it. Note that this is different from how HTTP URIs are parsed, for all except the item on line number 38 there. The parser in the specification should handle these in the same way. I have not introduced a no authority concept however. The parser in the specification also preserves the host as other user agents seem to preserve it. 4) For no authority URLs, including file://, on Windows and OS/2 only, if what looks like authority section looks like a drive letter, it's treated as part of the path. For example, file://c:/ is treated as the filename c:\. Looks like a drive letter is defined as ASCII letter (any case), followed by a ':' or '|' and then followed by end of string or '/' or '\\'. I'm not sure why this is checking for '\\' again, honestly. ;) Is this part of URL parsing or part of doing something with the resulting URL? (I do not plan on defining the latter because there's no observable difference from the web and it's platform-dependent.) 5) When parsing a no authority URL (including file://), and when item 4 above does not apply, it looks like Gecko skips everything after file:// up until the next '/', '?', or '#' char before parsing path stuff. So the host is dropped? This is not what other user agents do and http://www.cs.tut.fi/~jkorpela/fileurl.html suggests it might be useful in cases. I don't know anything about file: URLs however so whether that is still true or not I don't know. 6) On Windows and OS/2, when dynamically parsing a path for a no authority URL (not sure whether this is actually web-exposed, fwiw...) Gecko will do something involving looking for a path that's only an ASCII letter followed by ':' or '|' followed by end of string. I'm not quite sure what that part is about... It might have to do with the fact that URI objects in Gecko can have concepts of directory, filename, extension or something like that. 7) When doing URI equality comparisons, if two file:// URIs only differ in their directory/filename/extension (so the actual file path), then an equality comparison is done on the underlying file path objects. What this means depends on the OS. On Unix this is just a straight-up byte by byte compare of file paths. I think OS X now follows the Unix code path as do most other supported platforms. But note that file path in this case is normalized in various ways. Specifically: trailing '/' are stripped and some sort of normalization of HFS paths (possibly with a volume name) to POSIX paths is done on OSX. One result of the latter is that file:///Users%2fbzbarsky ends up seeing my home directory, which is ... slightly surprising. On Unix, the path bytes are treated as UTF-8 if they're valid UTF-8, else treated as whatever the current locale charset is, I think. Oh, and there is some sort of escaping going on for directory names, filenames, extensions. Not sure what that's about, if anything. The URI-escaping code is black magic, but I'm happy to run some black-box tests on it if someone wants to provide test strings. The things that don't go through the Unix code for this stuff are Windows and OS/2. I'm not going to dig through the OS/2 stuff, but on Windows if the filename contains a nonempty directory name and the second char is '|' that's converted to a ':'. Again, escaping for directory names and file names and extensions. Again, things that look like UTF-8 are treated thus and other stuff uses the current codepage. After all that, the actual equality comparison is done via _wcsicmp on the return value of GetShortPathNameW. So whatever things that combination considers equal are equal. 8) When actually resolving a file:// URL, the underlying file path object as described above is used to get the data. Plus there's a bit of weirdness about symlinks, I think... Mostly affects what's shown in the url