[whatwg] Relative URL plan
I've been trying to figure out a better data model for URLs so we can handle relative URLs for any scheme. The motivation for supporting relative URLs for any scheme can be found here: https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233 Per my testing the URL parser would still need special handling for what the URL Standard currently calls relative schemes. I will rename those special schemes since any URL would now in principle be able to handle relative URLs. Testing of Chrome and Safari shows that even data and javascript URLs have no special handling here. E.g. input | base URL | output test | javascript:/ | javascript:/test data:/../ | (none) | data:/ If fore the first example you then set host to test:81 you get javascript://test:81/test in Safari. (Though if you set host to /test:81 you get javascript:///test:81/test which seems bad. I will treat that as a bug.) This gives us three types of URLs: * Special URLs. file/http/https/etc. These need to handle three slashes after the scheme as two, can treat lack of slashes as relative, etc. * Relative URLs. All non-special URLs where the scheme is followed by a slash. * Non-relative URLs. All non-special-non-relative URLs. Non-relative URLs consist of {scheme, scheme data, query, fragment}. Relative URLs and special URLs consist of {scheme, username, password, host, port, path, query, fragment}. Special URLs cannot have an empty host (as that would lead to reparsing issues), relative URLs can. Relative URLs can also have a missing host (see javascript:/ above). I think we should try to restrict backslash replacement and the encoding override to special URLs. I also think we should change the API such that you cannot change anything for non-relative URLs (setters are no-ops, already largely the case). And that you cannot change the scheme from a special URL to a relative URL. Given the different handling of hosts that might lead to security issues. I will start rolling this out and write a bunch of tests. Even though both Chrome and Safari implement this they have differences and some logical inconsistencies that I think would be great to remove. I hope to get feedback on what we cannot do from the above. Hopefully Firefox not implementing any of this provides some wiggle room. -- https://annevankesteren.nl/
Re: [whatwg] Relative URL plan
On Tue, Jun 16, 2015 at 7:51 PM, Boris Zbarsky bzbar...@mit.edu wrote: about: is not standardized enough across UAs to really reason about. about and mailto are the reasons query is split out. Not so much that you can set it (you can't), but so that you can reason about it independently. And since non-Gecko-browsers don't have a parser per-scheme and that didn't really seem like a viable path forward anyway, data follows that logic. -- https://annevankesteren.nl/
Re: [whatwg] Relative URL plan
On 6/16/15 1:57 PM, Anne van Kesteren wrote: On Tue, Jun 16, 2015 at 7:51 PM, Boris Zbarsky bzbar...@mit.edu wrote: about: is not standardized enough across UAs to really reason about. about and mailto are the reasons query is split out. Not so much that you can set it (you can't) Why can't you? If it's something you want to reason about as a separate entity, why doesn't it make sense to set it as a separate entity? -Boris
Re: [whatwg] Relative URL plan
On Tue, Jun 16, 2015 at 6:52 PM, Boris Zbarsky bzbar...@mit.edu wrote: On 6/16/15 8:06 AM, Anne van Kesteren wrote: I also think we should change the API such that you cannot change anything for non-relative URLs Why would you disallow setting fragment for a non-relative URL? You're right, fragments make sense. Changing path (scheme data in the specification) or query would be painful however. -- https://annevankesteren.nl/
Re: [whatwg] Relative URL plan
On 6/16/15 1:18 PM, Anne van Kesteren wrote: On Tue, Jun 16, 2015 at 7:01 PM, Boris Zbarsky bzbar...@mit.edu wrote: about, data, etc. data: doesn't use query in practice. As in, any '?' that happens to be in there is totally accidental. about: is not standardized enough across UAs to really reason about. Hence my question: which schemes are not special, widespread, and actually use '?' to mean something? Specific use cases for this would make it clearer whether mutating the query is a meaningful operation. -Boris
Re: [whatwg] Relative URL plan
On Tue, Jun 16, 2015 at 7:01 PM, Boris Zbarsky bzbar...@mit.edu wrote: What are examples of non-relative URIs that use query? mailto:, I guess? about, data, etc. Though note that there's no such thing as non-relative URL. That completely depends on the first code point after the scheme and : in this brave new world. As far as I can tell apart from special casing a couple of schemes (now named special schemes in the URL Standard), everything else can be completely generic at the parser level. Of course there's also a level on top, e.g. for data URLs we'd look at scheme data + (query ? ? + query : ). The non-special URLs have a couple of forms: non-special:non-relative-path non-special:/null-host-and-relative-path non-special://host/and-relative-path non-special:///empty-host-and-relative-path (supporting this for special URLs is impossible due to reparsing issues) and apart from non-relative-path can be manipulated quite easily. Non-special URLs also don't have their host names IDNA-parsed. I'm actually pretty happy this seems within reach as it makes URLs much more extensible. I suppose we might still sometimes wish to make tweaks to the parser (as we did for e.g. blob URLs), but overall this should be much more compatible with the IETF POV. -- https://annevankesteren.nl/
Re: [whatwg] Relative URL plan
On Tue, Jun 16, 2015 at 8:18 PM, Boris Zbarsky bzbar...@mit.edu wrote: Why can't you? If it's something you want to reason about as a separate entity, why doesn't it make sense to set it as a separate entity? Actually, it seems like you can, though that would equally affect data URLs, but maybe that's not too bad. I guess for the API we could special case a couple of schemes to not support reading/writing as desired for optimizations. -- https://annevankesteren.nl/
Re: [whatwg] Relative URL plan
On 6/16/15 8:06 AM, Anne van Kesteren wrote: I also think we should change the API such that you cannot change anything for non-relative URLs Why would you disallow setting fragment for a non-relative URL? -Boris
Re: [whatwg] Relative URL plan
On 6/16/15 12:55 PM, Anne van Kesteren wrote: You're right, fragments make sense. Changing path (scheme data in the specification) or query would be painful however. I see no particular need to support changing path. Not sure about query; Gecko doesn't support query to start with on what you describe as non-relative URIs (as in, it's considered part of scheme data for our purposes and we don't parse that at all) so I'm not sure what use cases there might be for it. What are examples of non-relative URIs that use query? mailto:, I guess? -Boris
Re: [whatwg] Relative URL plan
On 6/16/15 2:23 PM, Anne van Kesteren wrote: Actually, it seems like you can, though that would equally affect data URLs, but maybe that's not too bad. I guess for the API we could special case a couple of schemes to not support reading/writing as desired for optimizations. What optimizations are we talking about here, specifically? Note that my general view for how URL objects should work internally in Gecko is that we should have an immutable backing store and mutators that clone-with-modifications (basically copy on write). Of course in terms of the web-exposed behavior we'd just have the web-exposed URL change which internal object it points to on mutation, so we can expose whatever mutators we want. -Boris
Re: [whatwg] Relative URL plan
On Tue, Jun 16, 2015 at 8:29 PM, Boris Zbarsky bzbar...@mit.edu wrote: What optimizations are we talking about here, specifically? Not sure. Was just indicating that we have that option if it would be particularly painful/pointless/footgun. I haven't exactly thought it through and there's not much feedback beyond http/https use cases. Note that my general view for how URL objects should work internally in Gecko is that we should have an immutable backing store and mutators that clone-with-modifications (basically copy on write). Of course in terms of the web-exposed behavior we'd just have the web-exposed URL change which internal object it points to on mutation, so we can expose whatever mutators we want. Makes sense. In retrospect I kind of wished new URL() at least started out immutable, so it could become a native value in JavaScript some day, but too late now. -- https://annevankesteren.nl/