Re: [whatwg] URL: file: URLs

2012-11-13 Thread Dan Veditz

On 10/31/12 7:38 AM, Benjamin Smedberg wrote:

On 10/30/2012 7:41 PM, João Eiras wrote:



I currently do not have Windows to test but I think I recall IE (or
Opera?) opening file://server/share if there was a network share at
\\server\share

Firefox has considered and rejected that kind of proposal for security
reasons. I can't find the bug right now, but I suspect that we would not
implement that feature even if it were specced.


On Windows Firefox file:server/path works, though. After the initial 
two slashes are satisfied we take //server/path as the path and hand 
it off to the OS which happily treats that as a UNC path.


-Dan Veditz



Re: [whatwg] URL: file: URLs

2012-11-05 Thread Anne van Kesteren
On Tue, Oct 30, 2012 at 10:46 PM, Simon Pieters sim...@opera.com wrote:
 My knee-jerk reaction is the same as Anne's; why not do this for all
 platforms?

I now made it so that for URL's whose scheme is file, [a-Z] followed
by either : or | as first path segment becomes [a-Z] followed by :. I
also made it so that for URL's whose scheme is file, [a-Z] followed by
either : or | as host get an empty host and use that as first path
segment instead (applying the rules before, ending up with | converted
to :).

http://url.spec.whatwg.org/ (see file host state and relative path
state, or search for file throughout)

Cheers,


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-31 Thread Mikko Rantalainen
João Eiras, 2012-10-31 01:41 (Europe/Helsinki):
 In both Firefox and Chrome if you type file://aaa/some/path, or
 file://localhost/some/path, the aaa and localhost parts are ignored, and
 the rest of the path is interpreted as a local file path. In Opera,
 anything that is not localhost gives an error.

How about following:

(1) file://c:/foo tries to connect to server c: and request shared
entity foo.

(2) file://foo/bar tries to connect to server foo and request shared
entity bar

(3) file:///c:/foo tries to refer to localhost path /c:/foo which in
windows environment would be interpreted as local C:\foo,
POSIX-compatible systems would try literal /c:/foo (colon is a valid
character in the path name).

(4) file://localhost/c:/foo is identical to (3) above.

I understand that (1) would behave different from some legacy user
agents but there really is not interoperability with such file URLs so I
guess that does not matter too much.

Some legacy user agents also support URLs such as

(5) file:///c|/foo which is considered equal to (3). I have no idea why
the pipe is considered better character than colon here.

Some problematic URLs are still possible:

(6) file:///foo/bar should refer to entity /foo/bar in
POSIX-compatible systems but I have no idea where it would map to with
windows-style drive letter naming at the start of the local path.

-- 
Mikko



Re: [whatwg] URL: file: URLs

2012-10-31 Thread Anne van Kesteren
On Wed, Oct 31, 2012 at 9:27 AM, Mikko Rantalainen
mikko.rantalai...@peda.net wrote:
 João Eiras, 2012-10-31 01:41 (Europe/Helsinki):
 In both Firefox and Chrome if you type file://aaa/some/path, or
 file://localhost/some/path, the aaa and localhost parts are ignored, and
 the rest of the path is interpreted as a local file path. In Opera,
 anything that is not localhost gives an error.

 How about following:

 (1) file://c:/foo tries to connect to server c: and request shared
 entity foo.

I think bz made it pretty clear we need to treat as if you typed
file:///c:/foo (at least on Windows, my preference is all
platforms). Not sure what the rules are exactly, but I believe they
are if you have a single ASCII letter followed by : or |.


 (2) file://foo/bar tries to connect to server foo and request shared
 entity bar

I think we should stick for now to how it should be parsed.
Interpretation is a different layer. So this would give host foo and
path /bar. E.g. on Mac it might well end up meaning that host does
not matter and localhost is always used. I think we should not let
that affect parsing or serialization however, because then we end up
with platform-specific rules.


 Some problematic URLs are still possible:

 (6) file:///foo/bar should refer to entity /foo/bar in
 POSIX-compatible systems but I have no idea where it would map to with
 windows-style drive letter naming at the start of the local path.

It would probably not map to anything, but that's fine. Again, we want
to treat parsing of file: URLs distinct from interpretation of file
URLs.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-31 Thread Boris Zbarsky

On 10/31/12 4:27 AM, Mikko Rantalainen wrote:

(1) file://c:/foo tries to connect to server c: and request shared
entity foo.


I don't think that's really acceptable, but see below.


I understand that (1) would behave different from some legacy user
agents


Well, it would behave differently from Gecko, for sure.  Have you tested 
any other UAs?  As in, do you have any evidence that some is not all 
in that sentence?  Also, how are you defining legacy?  Is it the same 
as existing?  ;)



but there really is not interoperability with such file URLs


Is that a guess, or do you have data that you're forgetting to present? 
 Serious question



Some legacy user agents also support URLs such as

(5) file:///c|/foo which is considered equal to (3). I have no idea why
the pipe is considered better character than colon here.


Such URIs were used commonly back when various things that processed 
URIs went into conniptions when they saw the ':' reserved character, iirc.


In fact, at one point there were UAs that supported the version with '|' 
but not with ':'.  And hence there was content that used that syntax.


And again, what's needed here is data on which UAs do what, not generic 
statements about some.


-Boris


Re: [whatwg] URL: file: URLs

2012-10-31 Thread Boris Zbarsky

On 10/31/12 9:52 AM, Anne van Kesteren wrote:

I think bz made it pretty clear we need to treat as if you typed
file:///c:/foo (at least on Windows, my preference is all
platforms). Not sure what the rules are exactly, but I believe they
are if you have a single ASCII letter followed by : or |.


That's correct for Gecko.  Specifically, what Gecko looks for is a URI 
that matches this regexp, effectively:


  ^file://[a-zA-Z][:|][/\\]?

So file://z: and file://z|/ and file://z:\ would all be treated as 
having no authority and the path starting with the z in Gecko.


Again, I would love info on other UAs.

-Boris


Re: [whatwg] URL: file: URLs

2012-10-31 Thread João Eiras
On Wed, 31 Oct 2012 15:38:36 +0100, Benjamin Smedberg  
benja...@smedbergs.us wrote:



On 10/30/2012 7:41 PM, João Eiras wrote:



I currently do not have Windows to test but I think I recall IE (or  
Opera?) opening file://server/share if there was a network share at  
\\server\share
Firefox has considered and rejected that kind of proposal for security  
reasons. I can't find the bug right now, but I suspect that we would not  
implement that feature even if it were specced.




Obviously supporting such thing would require implementing security checks  
between origins, and the origin generation algorithm updated to allow  
file://host, while currently it specifies that all file: uris have a null  
origin.


And, FF is no the only program that can interpret file uris. Any  
application should be able to open file uris, just like they can open  
local file paths now, or http uris.


Re: [whatwg] URL: file: URLs

2012-10-31 Thread Anne van Kesteren
On Wed, Oct 31, 2012 at 3:38 PM, Benjamin Smedberg
benja...@smedbergs.us wrote:
 I currently do not have Windows to test but I think I recall IE (or
 Opera?) opening file://server/share if there was a network share at
 \\server\share

 Firefox has considered and rejected that kind of proposal for security
 reasons. I can't find the bug right now, but I suspect that we would not
 implement that feature even if it were specced.

Just to be clear, the proposal does not require you to implement that
feature (that would be processing of a parsed file: URL, which is
undefined), but it does require you to preserve the host name (which
you are free to ignore or treat as a network error).


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-30 Thread Anne van Kesteren
On Mon, Oct 29, 2012 at 4:24 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 10/29/12 10:53 AM, Anne van Kesteren wrote:
 But at that point in a URL you cannot have a path. A path starts with
 a slash after the host.

 The point is that on Windows, Gecko parses file://c:/something as
 file:///c:/something

 As in, it's an exception to the general if there are two slashes after the
 file: then the next thing is a host rule.

Thanks, I missed that. It seems however we could have that parsing
rule for all platforms without issue, no? After all, file://c:/ does
not parse currently on non-Windows platforms.


 I suppose, I would hate it though for new URL(...) to depend on the
 platform.

 I'm not sure there are great solutions here.  :(

Yeah, I'm willing to suck it up, but I would like to explore our
options before we go that route.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-30 Thread Boris Zbarsky

On 10/30/12 12:25 PM, Anne van Kesteren wrote:

Thanks, I missed that. It seems however we could have that parsing
rule for all platforms without issue, no?


Hmm.  Possibly, yes.  I'd love feedback from other UAs here!

-Boris


Re: [whatwg] URL: file: URLs

2012-10-30 Thread Simon Pieters

On Tue, 30 Oct 2012 18:38:46 +0200, Boris Zbarsky bzbar...@mit.edu wrote:


On 10/30/12 12:25 PM, Anne van Kesteren wrote:

Thanks, I missed that. It seems however we could have that parsing
rule for all platforms without issue, no?


Hmm.  Possibly, yes.  I'd love feedback from other UAs here!


My knee-jerk reaction is the same as Anne's; why not do this for all  
platforms?


--
Simon Pieters
Opera Software


Re: [whatwg] URL: file: URLs

2012-10-30 Thread João Eiras
On Tue, 30 Oct 2012 16:25:30 -, Anne van Kesteren ann...@annevk.nl  
wrote:



On Mon, Oct 29, 2012 at 4:24 PM, Boris Zbarsky bzbar...@mit.edu wrote:

On 10/29/12 10:53 AM, Anne van Kesteren wrote:

But at that point in a URL you cannot have a path. A path starts with
a slash after the host.


The point is that on Windows, Gecko parses file://c:/something as
file:///c:/something

As in, it's an exception to the general if there are two slashes after  
the

file: then the next thing is a host rule.


Thanks, I missed that. It seems however we could have that parsing
rule for all platforms without issue, no? After all, file://c:/ does
not parse currently on non-Windows platforms.



I suppose, I would hate it though for new URL(...) to depend on the
platform.


I'm not sure there are great solutions here.  :(


Yeah, I'm willing to suck it up, but I would like to explore our
options before we go that route.



In both Firefox and Chrome if you type file://aaa/some/path, or  
file://localhost/some/path, the aaa and localhost parts are ignored, and  
the rest of the path is interpreted as a local file path. In Opera,  
anything that is not localhost gives an error.


I currently do not have Windows to test but I think I recall IE (or  
Opera?) opening file://server/share if there was a network share at  
\\server\share


In a previous job I had, where the environment was a bit windows centric,  
there was a wiki with documentation with links to files on network shares.  
I recall the urls looked something like file:\\server\some\path in the  
HTML. IE opened the files (hence people continued to write them). The  
other browsers didn't.


The point is that the file uri can and should have the authority part, or  
host, and that can be the local machine, or a network share.


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Anne van Kesteren
On Sun, Oct 28, 2012 at 6:51 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Same as the comment I quoted?  As same as something else?

Same as you quoted.


 Well, the Gecko parser preserves the host at this stage assuming the URI was
 correctly formatted with a host.  Again:

   blah://foo/bar = blah://foo/bar

 The interesting things happen when you have 0, 1, or 3 slashes between ':'
 and foo.  The handling of foo after this point is a separate issue.

Those are handled the same as in Gecko (also matches Safari I think,
Chrome strips are starting slashes (like if you have four), but I did
not copy that).


 In Gecko, it's part of URL parsing.  More precisely, it's part of the
 normalization performed as part of constructing a URL object from a
 string.  Since this is also how we parse URLs, it's effectively all part of
 the package.

 But note that it would be a bit odd of file://c:/ claimed to have a host of
 c with a default port or some such...

Maybe I should introduce a file host state that supports colons in
the host name (or special case the host state further, but the
former seems cleaner). Most browsers seem to fail currently on input
such as file://c:/ but this is on a Mac so maybe that's the
difference. I would prefer having the parsing be consistent though.


 7 and 8 are not, though at some point we'll need to define equality
 comparisons anyway.

Yeah, I guess at some point someone would need to write a processing
file: URLs specification (for post-parsing operations). On the other
hand, it's not entirely clear to me that needs to be interoperable.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Boris Zbarsky

On 10/29/12 5:00 AM, Anne van Kesteren wrote:

But note that it would be a bit odd of file://c:/ claimed to have a host of
c with a default port or some such...


Maybe I should introduce a file host state that supports colons in
the host name (or special case the host state further, but the
former seems cleaner).


I don't think that's particularly desirable.  The c: is totally part 
of the path; treating it otherwise would just be confusing.  Imo.



Most browsers seem to fail currently on input
such as file://c:/ but this is on a Mac


Yes, doing that on a Mac would just be wrong


I would prefer having the parsing be consistent though.


You mean across Windows and non-Windows?  I'm not sure that's viable.

-Boris



Re: [whatwg] URL: file: URLs

2012-10-29 Thread Anne van Kesteren
On Mon, Oct 29, 2012 at 3:13 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 10/29/12 5:00 AM, Anne van Kesteren wrote:
 Maybe I should introduce a file host state that supports colons in
 the host name (or special case the host state further, but the
 former seems cleaner).

 I don't think that's particularly desirable.  The c: is totally part of
 the path; treating it otherwise would just be confusing.  Imo.

But at that point in a URL you cannot have a path. A path starts with
a slash after the host. Especially if you want file://test/ to parse
with test being the host.


 Most browsers seem to fail currently on input
 such as file://c:/ but this is on a Mac

 Yes, doing that on a Mac would just be wrong

I suppose, I would hate it though for new URL(...) to depend on the platform.


-- 
http://annevankesteren.nl/


Re: [whatwg] URL: file: URLs

2012-10-29 Thread Boris Zbarsky

On 10/29/12 10:53 AM, Anne van Kesteren wrote:

But at that point in a URL you cannot have a path. A path starts with
a slash after the host.


The point is that on Windows, Gecko parses file://c:/something as 
file:///c:/something


As in, it's an exception to the general if there are two slashes after 
the file: then the next thing is a host rule.



I suppose, I would hate it though for new URL(...) to depend on the platform.


I'm not sure there are great solutions here.  :(

-Boris


Re: [whatwg] URL: file: URLs

2012-10-28 Thread Boris Zbarsky

On 10/27/12 3:35 PM, Anne van Kesteren wrote:

This is covered as we do this for all URLs currently with a relative
scheme (http/ws/...). I know you indicated this as potentially
problematic


Let's have that fight separately.  ;)


2)  file:// URIs are parsed as a no authority URL in Gecko.  Quoting the
IDL comment:

...

The parser in the specification should handle these in the same way.


Same as the comment I quoted?  As same as something else?


I have not introduced a no authority concept however. The parser in
the specification also preserves the host as other user agents seem to
preserve it.


Well, the Gecko parser preserves the host at this stage assuming the URI 
was correctly formatted with a host.  Again:


  blah://foo/bar = blah://foo/bar

The interesting things happen when you have 0, 1, or 3 slashes between 
':' and foo.  The handling of foo after this point is a separate issue.



4)  For no authority URLs, including file://, on Windows and OS/2 only, if
what looks like authority section looks like a drive letter, it's treated as
part of the path.  For example, file://c:/ is treated as the filename
c:\.  Looks like a drive letter is defined as ASCII letter (any case),
followed by a ':' or '|' and then followed by end of string or '/' or '\\'.
I'm not sure why this is checking for '\\' again, honestly.  ;)


Is this part of URL parsing or part of doing something with the
resulting URL?


In Gecko, it's part of URL parsing.  More precisely, it's part of the 
normalization performed as part of constructing a URL object from a 
string.  Since this is also how we parse URLs, it's effectively all part 
of the package.


But note that it would be a bit odd of file://c:/ claimed to have a host 
of c with a default port or some such...



5)  When parsing a no authority URL (including file://), and when item 4
above does not apply, it looks like Gecko skips everything after file://
up until the next '/', '?', or '#' char before parsing path stuff.


So the host is dropped?


In Gecko, I believe so, yes.  I'm not saying this is desirable; just 
what Gecko does.



6)  On Windows and OS/2, when dynamically parsing a path for a no
authority URL (not sure whether this is actually web-exposed, fwiw...)
Gecko will do something involving looking for a path that's only an ASCII
letter followed by ':' or '|' followed by end of string.

...

7)  When doing URI equality comparisons

...

8)  When actually resolving a file:// URL

These points do not seem to be about parsing, correct?


Well, point 6 is about parsing, sort of.

7 and 8 are not, though at some point we'll need to define equality 
comparisons anyway.


-Boris




[whatwg] URL: file: URLs

2012-10-27 Thread Anne van Kesteren
On Mon, Sep 24, 2012 at 4:06 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Hmm.  So here goes at least a partial list:

 1)  On Windows and OS/2, Gecko replaces '\\' with '/' in file:// URI strings
 before doing anything else with the string when parsing a new URL.  That
 includes relative URI strings being resolved against a file:// base.

This is covered as we do this for all URLs currently with a relative
scheme (http/ws/...). I know you indicated this as potentially
problematic, but note that a) \ as a raw code point is invalid in a
URL and b) because of a) you can represent it as %5C, and c) other
user agents have hit issues with not supporting \ and / outside of
file: URLs.


 2)  file:// URIs are parsed as a no authority URL in Gecko.  Quoting the
 IDL comment:

 35 /**
 36  * blah:foo/bar= blah:///foo/bar
 37  * blah:/foo/bar   = blah:///foo/bar
 38  * blah://foo/bar  = blah://foo/bar
 39  * blah:///foo/bar = blah:///foo/bar
 40  */

 where the thing on the left is the input string and the thing on the right
 is the normalized form that the parser produces from it.  Note that this is
 different from how HTTP URIs are parsed, for all except the item on line
 number 38 there.

The parser in the specification should handle these in the same way. I
have not introduced a no authority concept however. The parser in
the specification also preserves the host as other user agents seem to
preserve it.


 4)  For no authority URLs, including file://, on Windows and OS/2 only, if
 what looks like authority section looks like a drive letter, it's treated as
 part of the path.  For example, file://c:/ is treated as the filename
 c:\.  Looks like a drive letter is defined as ASCII letter (any case),
 followed by a ':' or '|' and then followed by end of string or '/' or '\\'.
 I'm not sure why this is checking for '\\' again, honestly.  ;)

Is this part of URL parsing or part of doing something with the
resulting URL? (I do not plan on defining the latter because there's
no observable difference from the web and it's platform-dependent.)


 5)  When parsing a no authority URL (including file://), and when item 4
 above does not apply, it looks like Gecko skips everything after file://
 up until the next '/', '?', or '#' char before parsing path stuff.

So the host is dropped? This is not what other user agents do and
http://www.cs.tut.fi/~jkorpela/fileurl.html suggests it might be
useful in cases. I don't know anything about file: URLs however so
whether that is still true or not I don't know.


 6)  On Windows and OS/2, when dynamically parsing a path for a no
 authority URL (not sure whether this is actually web-exposed, fwiw...)
 Gecko will do something involving looking for a path that's only an ASCII
 letter followed by ':' or '|' followed by end of string.  I'm not quite sure
 what that part is about...  It might have to do with the fact that URI
 objects in Gecko can have concepts of directory, filename, extension
 or something like that.

 7)  When doing URI equality comparisons, if two file:// URIs only differ in
 their directory/filename/extension (so the actual file path), then an
 equality comparison is done on the underlying file path objects.  What this
 means depends on the OS.  On Unix this is just a straight-up byte by byte
 compare of file paths.  I think OS X now follows the Unix code path as do
 most other supported platforms.  But note that file path in this case is
 normalized in various ways.  Specifically: trailing '/' are stripped and
 some sort of normalization of HFS paths (possibly with a volume name) to
 POSIX paths is done on OSX.  One result of the latter is that
 file:///Users%2fbzbarsky ends up seeing my home directory, which is ...
 slightly surprising.  On Unix, the path bytes are treated as UTF-8 if
 they're valid UTF-8, else treated as whatever the current locale charset is,
 I think.  Oh, and there is some sort of escaping going on for directory
 names, filenames, extensions.  Not sure what that's about, if anything.  The
 URI-escaping code is black magic, but I'm happy to run some black-box tests
 on it if someone wants to provide test strings.

 The things that don't go through the Unix code for this stuff are Windows
 and OS/2.  I'm not going to dig through the OS/2 stuff, but on Windows if
 the filename contains a nonempty directory name and the second char is '|'
 that's converted to a ':'.  Again, escaping for directory names and file
 names and extensions.  Again, things that look like UTF-8 are treated thus
 and other stuff uses the current codepage. After all that, the actual
 equality comparison is done via _wcsicmp on the return value of
 GetShortPathNameW.  So whatever things that combination considers equal are
 equal.

 8)  When actually resolving a file:// URL, the underlying file path object
 as described above is used to get the data.  Plus there's a bit of weirdness
 about symlinks, I think...  Mostly affects what's shown in the url