[whatwg] Relative URL plan

2015-06-16 Thread Anne van Kesteren
I've been trying to figure out a better data model for URLs so we can
handle relative URLs for any scheme. The motivation for supporting
relative URLs for any scheme can be found here:

  https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233

Per my testing the URL parser would still need special handling for
what the URL Standard currently calls relative schemes. I will rename
those special schemes since any URL would now in principle be able to
handle relative URLs.

Testing of Chrome and Safari shows that even data and javascript URLs
have no special handling here. E.g.

  input | base URL | output
  test  | javascript:/ | javascript:/test
  data:/../ | (none)   | data:/

If fore the first example you then set host to test:81 you get
javascript://test:81/test in Safari. (Though if you set host to
/test:81 you get javascript:///test:81/test which seems bad. I
will treat that as a bug.)

This gives us three types of URLs:

* Special URLs. file/http/https/etc. These need to handle three
slashes after the scheme as two, can treat lack of slashes as
relative, etc.
* Relative URLs. All non-special URLs where the scheme is followed by a slash.
* Non-relative URLs. All non-special-non-relative URLs.

Non-relative URLs consist of {scheme, scheme data, query, fragment}.
Relative URLs and special URLs consist of {scheme, username, password,
host, port, path, query, fragment}. Special URLs cannot have an empty
host (as that would lead to reparsing issues), relative URLs can.
Relative URLs can also have a missing host (see javascript:/ above).

I think we should try to restrict backslash replacement and the
encoding override to special URLs.

I also think we should change the API such that you cannot change
anything for non-relative URLs (setters are no-ops, already largely
the case). And that you cannot change the scheme from a special URL to
a relative URL. Given the different handling of hosts that might lead
to security issues.

I will start rolling this out and write a bunch of tests. Even though
both Chrome and Safari implement this they have differences and some
logical inconsistencies that I think would be great to remove. I hope
to get feedback on what we cannot do from the above. Hopefully Firefox
not implementing any of this provides some wiggle room.


-- 
https://annevankesteren.nl/


Re: [whatwg] Relative URL plan

2015-06-16 Thread Anne van Kesteren
On Tue, Jun 16, 2015 at 7:51 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 about: is not standardized enough across UAs to really reason about.

about and mailto are the reasons query is split out. Not so much that
you can set it (you can't), but so that you can reason about it
independently. And since non-Gecko-browsers don't have a parser
per-scheme and that didn't really seem like a viable path forward
anyway, data follows that logic.


-- 
https://annevankesteren.nl/


Re: [whatwg] Relative URL plan

2015-06-16 Thread Boris Zbarsky

On 6/16/15 1:57 PM, Anne van Kesteren wrote:

On Tue, Jun 16, 2015 at 7:51 PM, Boris Zbarsky bzbar...@mit.edu wrote:

about: is not standardized enough across UAs to really reason about.


about and mailto are the reasons query is split out. Not so much that
you can set it (you can't)


Why can't you?  If it's something you want to reason about as a separate 
entity, why doesn't it make sense to set it as a separate entity?


-Boris


Re: [whatwg] Relative URL plan

2015-06-16 Thread Anne van Kesteren
On Tue, Jun 16, 2015 at 6:52 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 6/16/15 8:06 AM, Anne van Kesteren wrote:

 I also think we should change the API such that you cannot change
 anything for non-relative URLs

 Why would you disallow setting fragment for a non-relative URL?

You're right, fragments make sense. Changing path (scheme data in the
specification) or query would be painful however.


-- 
https://annevankesteren.nl/


Re: [whatwg] Relative URL plan

2015-06-16 Thread Boris Zbarsky

On 6/16/15 1:18 PM, Anne van Kesteren wrote:

On Tue, Jun 16, 2015 at 7:01 PM, Boris Zbarsky bzbar...@mit.edu wrote:

about, data, etc.


data: doesn't use query in practice.  As in, any '?' that happens to be 
in there is totally accidental.


about: is not standardized enough across UAs to really reason about.

Hence my question: which schemes are not special, widespread, and 
actually use '?' to mean something?  Specific use cases for this would 
make it clearer whether mutating the query is a meaningful operation.


-Boris


Re: [whatwg] Relative URL plan

2015-06-16 Thread Anne van Kesteren
On Tue, Jun 16, 2015 at 7:01 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 What are examples of non-relative URIs that use query?  mailto:, I guess?

about, data, etc. Though note that there's no such thing as
non-relative URL. That completely depends on the first code point
after the scheme and : in this brave new world.

As far as I can tell apart from special casing a couple of schemes
(now named special schemes in the URL Standard), everything else can
be completely generic at the parser level. Of course there's also a
level on top, e.g. for data URLs we'd look at scheme data + (query ?
? + query : ).

The non-special URLs have a couple of forms:

  non-special:non-relative-path
  non-special:/null-host-and-relative-path
  non-special://host/and-relative-path
  non-special:///empty-host-and-relative-path (supporting this for
special URLs is impossible due to reparsing issues)

and apart from non-relative-path can be manipulated quite easily.
Non-special URLs also don't have their host names IDNA-parsed.

I'm actually pretty happy this seems within reach as it makes URLs
much more extensible. I suppose we might still sometimes wish to make
tweaks to the parser (as we did for e.g. blob URLs), but overall this
should be much more compatible with the IETF POV.


-- 
https://annevankesteren.nl/


Re: [whatwg] Relative URL plan

2015-06-16 Thread Anne van Kesteren
On Tue, Jun 16, 2015 at 8:18 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Why can't you?  If it's something you want to reason about as a separate
 entity, why doesn't it make sense to set it as a separate entity?

Actually, it seems like you can, though that would equally affect data
URLs, but maybe that's not too bad. I guess for the API we could
special case a couple of schemes to not support reading/writing as
desired for optimizations.


-- 
https://annevankesteren.nl/


Re: [whatwg] Relative URL plan

2015-06-16 Thread Boris Zbarsky

On 6/16/15 8:06 AM, Anne van Kesteren wrote:

I also think we should change the API such that you cannot change
anything for non-relative URLs


Why would you disallow setting fragment for a non-relative URL?

-Boris



Re: [whatwg] Relative URL plan

2015-06-16 Thread Boris Zbarsky

On 6/16/15 12:55 PM, Anne van Kesteren wrote:

You're right, fragments make sense. Changing path (scheme data in the
specification) or query would be painful however.


I see no particular need to support changing path.

Not sure about query; Gecko doesn't support query to start with on what 
you describe as non-relative URIs (as in, it's considered part of 
scheme data for our purposes and we don't parse that at all) so I'm 
not sure what use cases there might be for it.


What are examples of non-relative URIs that use query?  mailto:, I guess?

-Boris



Re: [whatwg] Relative URL plan

2015-06-16 Thread Boris Zbarsky

On 6/16/15 2:23 PM, Anne van Kesteren wrote:

Actually, it seems like you can, though that would equally affect data
URLs, but maybe that's not too bad. I guess for the API we could
special case a couple of schemes to not support reading/writing as
desired for optimizations.


What optimizations are we talking about here, specifically?

Note that my general view for how URL objects should work internally in 
Gecko is that we should have an immutable backing store and mutators 
that clone-with-modifications (basically copy on write).  Of course in 
terms of the web-exposed behavior we'd just have the web-exposed URL 
change which internal object it points to on mutation, so we can expose 
whatever mutators we want.


-Boris



Re: [whatwg] Relative URL plan

2015-06-16 Thread Anne van Kesteren
On Tue, Jun 16, 2015 at 8:29 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 What optimizations are we talking about here, specifically?

Not sure. Was just indicating that we have that option if it would be
particularly painful/pointless/footgun. I haven't exactly thought it
through and there's not much feedback beyond http/https use cases.


 Note that my general view for how URL objects should work internally in
 Gecko is that we should have an immutable backing store and mutators that
 clone-with-modifications (basically copy on write).  Of course in terms of
 the web-exposed behavior we'd just have the web-exposed URL change which
 internal object it points to on mutation, so we can expose whatever mutators
 we want.

Makes sense. In retrospect I kind of wished new URL() at least started
out immutable, so it could become a native value in JavaScript some
day, but too late now.


-- 
https://annevankesteren.nl/