Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Thu, 26 Mar 2009, Kartikaya Gupta wrote: It seems that major browsers all support URL decomposition on HTMLAnchorElement, but this doesn't seem to be stated anywhere in the HTML5 spec. The jQuery/tabs library seems to depend on this (specifically, on the hash property) being available. Could the HTMLAnchorElement interface be updated to reflect this? Done. On Thu, 26 Mar 2009, João Eiras wrote: Browsers also support partially setting each of the url fields separately, although error handling between all of them is very inconsistent. Note: if you specify this behavior, then you need to specify what happens for http:, https:, data:, mailto: and unknown: Done. Browsers differ in how this is handled; the spec doesn't quite match any of them (it takes the most sane aspects of each browser I tested). On Thu, 26 Mar 2009, Boris Zbarsky wrote: If you specify the setters then you also need to specify how this affects the value of the href attribute in the DOM. For example, in Gecko if you have an a href=foo#bar which has base URI http://example.com/; and you set anchor.hash on that anchor to baz, then the attribute value is changed to http://example.com/foo#baz;. I can't speak to what happens in other browsers. Done. On Thu, 26 Mar 2009, Kartikaya Gupta wrote: var a = document.createElement('a'); a.setAttribute('href', 'http://example.org:123/foo?bar#baz'); a.hostname = null; alert(a.hostname); // displays foo alert(a.href); // displays http://foo/?bar#baz; The spec says null and http://null:123/foo?bar#baz;. If WebIDL changes to say that 'null' becomes , then the spec says example.org and http://example.org:123/foo?bar#baz; (setting 'host' or 'hostname' to the empty string is ignored). a.setAttribute('href', 'scheme://host/path'); a.host = null; alert(a.host); // displays alert(a.pathname); // displays alert(a.href); // displays scheme:host/path If 'null' becomes null: null, /path, and scheme://null/path. If 'null' becomes : host, /path, and scheme://host/path (setting 'host' or 'hostname' to the empty string is ignored). On Thu, 26 Mar 2009, Biju wrote: var a = document.createElement('a'); Assuming a base URL of http://example.com/path/: a.setAttribute('href', 'http:/Example.org:123/foo?bar#baz'); //Case 1 alert(a.href); Per spec: http://example.com/Example.org:123/foo?bar#baz; a.setAttribute('href', 'http:example.org:123/foo?bar#baz');//Case 2 alert(a.href); Per spec: http://example.com/path/Example.org:123/foo?bar#baz; a.setAttribute('href', 'http:///example.org:123/foo?bar#baz');//Case 3 alert(a.href); Per spec: (the URL can't be parsed). a.setAttribute('href', 'http:/example.org:123/foo?bar#baz');//Case 4 alert(a.href); Per spec: (the URL can't be parsed). -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Fri, 27 Mar 2009 21:53:48 -0400, Boris Zbarsky bzbar...@mit.edu wrote: Kartikaya Gupta wrote: The empty string falls under the anything else case in my suggestion above and would work as you expect. Null and empty string should, imo, have the same behavior here. It doesn't make sense to treat them differently to me. Do you agree that null and empty string should behave differently for search and hash? To me, it doesn't make sense to treat null and the empty string differently for some components but not others. There are big scary comments in the Gecko code for these setters saying that they must never ever throw. I suspect that making them throw would be a serious web compat issue. Is this Gecko-internal code you're referring to? Or the setters exposed to web content via HTMLAnchorElement? The latter. The Gecko-internal URI code does in fact throw on a lot of these setters, and the HTMLAnchorElement methods catch and eat these exceptions, very much on purpose. Ok. I'll assume there is valid reasoning behind that. Replace all the throws with be silently ignored in my proposal. Changing from an authority to a non-authority URI or the other way around doesn't seem desirable to me (and would only work for unknown schemes anyway, presumably, at best; it's better if it just never works). Does it matter? Since it's an unknown scheme, it's basically opaque data. You can't dereference it and fetch the resource it points to No, but you can pass it off to a helper application. In any case, my comment above was more concerned with your proposal that one should be able to create a non-authority http: URI than about unknown schemes. I don't think my proposal allowed creation of a non-authority http: URI. I said that 'Attempts to set host to null for a scheme known to require an authority should throw.' Since http is a scheme known to require an authority, you wouldn't be able to null out the authority. The one loophole I missed would be to create a non-http non-authority URI and then change the scheme to http. That can be fixed by amending the first sentence of my proposal to the following: - Attempts to set protocol to null, the empty string, or anything containing invalid characters (i.e. not in the scheme production of RFC3986) should throw. Setting it to a scheme known to require an authority when the authority component is null should also throw. Setting it to a scheme known to require no authority when the authority component is non-null should also throw. Setting it to anything else should be allowed and should update the scheme component of the underlying URI. (With appropriate adjustments of s/throw/be silently ignored/) Cheers, kats
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Kartikaya Gupta wrote: Do you agree that null and empty string should behave differently for search and hash? No, not really. Are they treated differently in current UAs (past null being treated as null, possibly)? To me, it doesn't make sense to treat null and the empty string differently for some components but not others. Agreed that it would be confusing from a web developer point of view. Of course from a URI point of view some URI components can be empty but present or can be not present at all, as you point out above. The latter. The Gecko-internal URI code does in fact throw on a lot of these setters, and the HTMLAnchorElement methods catch and eat these exceptions, very much on purpose. Ok. I'll assume there is valid reasoning behind that. Replace all the throws with be silently ignored in my proposal. For what it's worth, I suspect that the silent fail is somewhat interoperably implemented already. No, but you can pass it off to a helper application. In any case, my comment above was more concerned with your proposal that one should be able to create a non-authority http: URI than about unknown schemes. I don't think my proposal allowed creation of a non-authority http: URI. I said that 'Attempts to set host to null for a scheme known to require an authority should throw.' Since http is a scheme known to require an authority, you wouldn't be able to null out the authority. Or set it to the empty string, which has the same effect. Your proposal treats those differently. - Attempts to set protocol to null, the empty string, or anything containing invalid characters (i.e. not in the scheme production of RFC3986) should throw. Setting it to a scheme known to require an authority when the authority component is null should also throw. Setting it to a scheme known to require no authority when the authority component is non-null should also throw. Setting it to anything else should be allowed and should update the scheme component of the underlying URI. Honestly, I can't think of a sane way to define a protocol setter that changes from one URI type to another (type being has authority, doesn't have authority, unknown). Actually, as far as Gecko is concerned there are 5 different types; see the three constants defined at http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIStandardURL.idl#56 for three, plus not a hierarchichal URI (which never does the fixup from scheme:foo to scheme://foo) and unknown (treated like not a hierarchical URI). -Boris
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Interestingly, it looks like Opera doesn't support the hostname setter at all. Safari ignores the call in this case. I don't have IE to test offhand. True. Opera currently does not support setting these values separately.
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Thu, 26 Mar 2009 23:01:34 -0400, Biju bijumaill...@gmail.com wrote: On Thu, Mar 26, 2009 at 5:26 PM, Kartikaya Gupta This behavior seems rather inconsistent and possibly buggy. At first look I also thought it is inconsistent But later I found Firefox is very consistent. I think reason why it happening like that is because Firefox clean up URL by removing extra slash before host name and adding a slash after host name and also convert host name to lowercase. Well, yes, I'm sure there is a simple set of rules that will explain this behavior from the implementation point of view. However, it is inconsistent for the average user. The behavior where nulling out the hostname causes the first path component to become the hostname is particularly odd, IMO. I can't think of any use of this behavior that would be considered anything other than an ugly hack. Cheers, kats
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Fri, Mar 27, 2009 at 4:55 AM, Kartikaya Gupta lists.wha...@stakface.com wrote: On Thu, 26 Mar 2009 23:01:34 -0400, Biju bijumaill...@gmail.com wrote: On Thu, Mar 26, 2009 at 5:26 PM, Kartikaya Gupta This behavior seems rather inconsistent and possibly buggy. At first look I also thought it is inconsistent But later I found Firefox is very consistent. I think reason why it happening like that is because Firefox clean up URL by removing extra slash before host name and adding a slash after host name and also convert host name to lowercase. Well, yes, I'm sure there is a simple set of rules that will explain this behavior from the implementation point of view. However, it is inconsistent for the average user. The behavior where nulling out the hostname causes the first path component to become the hostname is particularly odd, IMO. I can't think of any use of this behavior that would be considered anything other than an ugly hack. What would you suggest should happen instead? I don't see a reason why we wouldn't be ok with changing how firefox behaves here, but discussions about better ways of doing it are a lot more productive than discussions about how bad the current behavior is. / Jonas
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Fri, Mar 27, 2009 at 11:02 AM, Kristof Zelechovski giecr...@stegny.2a.pl wrote: Instead of setting the host name of a hyperreference to null, use the host name (of the base) of the current document instead. That seems pretty arbitrary. How about throwing or setting the whole href to null instead. / Jonas
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Kartikaya Gupta wrote: I was trying different things to see what happens and came across some particularly weird behavior in Gecko/2009021910 Firefox/3.0.7: var a = document.createElement('a'); a.setAttribute('href', 'http://example.org:123/foo?bar#baz'); a.hostname = null; alert(a.hostname); // displays foo alert(a.href); // displays http://foo/?bar#baz; Indeed. The behavior you're seeing is due setting the hostname to the empty string, basically... That said, this code should probably bail out when that happens instead of pressing on. I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=485562 on this. Interestingly, it looks like Opera doesn't support the hostname setter at all. Safari ignores the call in this case. I don't have IE to test offhand. a.setAttribute('href', 'scheme://host/path'); a.host = null; alert(a.host); // displays alert(a.pathname); // displays alert(a.href); // displays scheme:host/path This case is more fun. It's an unknown scheme, so it's assumed to be a no-authority non-hierarchical scheme and the URI is parsed that way. This does cause issues, since RFC 3986 says that i there is no authority then the path cannot begin with two slashes (so if scheme is a non-authority protocol then the URI is invalid, in fact). But deciding whether this is an invalid URI or not involves knowing something about the scheme protocol, which is rather hard in this case, since you just made it up. ;) In general, parsing a URI for a scheme you know nothing about is a huge pain, especially if your URL parser is expected to do fixup on invalid URIs (which the parser for the href attribute of a is certainly expected to do). -Boris
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Fri, 27 Mar 2009 14:14:35 -0400, Boris Zbarsky bzbar...@mit.edu wrote: This case is more fun. It's an unknown scheme, so it's assumed to be a no-authority non-hierarchical scheme and the URI is parsed that way. This does cause issues, since RFC 3986 says that i there is no authority then the path cannot begin with two slashes (so if scheme is a non-authority protocol then the URI is invalid, in fact). But deciding whether this is an invalid URI or not involves knowing something about the scheme protocol, which is rather hard in this case, since you just made it up. ;) For unknown schemes, if the authority starts with //, doesn't it make sense to assume that the scheme allows an authority? I would assume that for an unknown scheme, the generic URI syntax in RFC3986 should be followed, which would interpret the stuff between // and the following / as the authority. On Fri, 27 Mar 2009 10:49:41 -0700, Jonas Sicking jo...@sicking.cc wrote: What would you suggest should happen instead? I don't see a reason why we wouldn't be ok with changing how firefox behaves here, but discussions about better ways of doing it are a lot more productive than discussions about how bad the current behavior is. Agreed. How about the following: - Attempts to set protocol to null, the empty string, or anything containing invalid characters (i.e. not in the scheme production of RFC3986) should throw. Setting it to anything else should be allowed and should update the scheme component of the underlying URI. - Attempts to set host to null for a scheme known to require an authority should throw. For all other schemes (i.e. ones that do not require an authority, or unknown schemes) setting host to null should remove the authority component of the underlying URI. For all schemes, setting the host to anything else should be allowed (invalid characters are escaped) and should update the authority component of the underlying URI. - Attempts to set hostname should behave the same as setting host, except that in cases where the authority is updated with a new value (this excludes the case where the authority is being removed), the old port (if any) should be preserved. - Any attempt to set port when the host is null (i.e. there is no authority component in the underlying URI) should throw. If there is a non-null host, then: (1) setting port to null should remove the port subcomponent from the underlying URI if there is one, (2) setting port to the empty string or invalid characters should throw, and (3) setting port to a valid port string should update the port subcomponent of the underlying URI. - Attempts to set pathname to null should throw, since the path is a required component of a URI. Setting pathname to anything else should be allowed and should update the path component of the underlying URI (invalid characters are escaped). - Attempts to set search to null should remove the query component from the underlying URI, setting it to anything else is allowed and should update the query component of the underlying URI (invalid characters are escaped). - Attempts to set hash to null should remove the fragment component from the underlying URI, setting it to anything else is allowed and should update the fragment component of the underlying URI (invalid characters are escaped). - In all cases, undefined should be treated as null. (i.e. [Undefined=Null, Null=Null] in WebIDL) Notes: - In general I made every invalid action throw rather than ignoring the attempt because I personally don't like it when things fail silently. - I think that null should not be stringified to null because for some of the components setting to null makes sense, and I prefer it all components are consistent with respect to stringification. - In cases where the scheme is unknown I think the behavior should be such that it follows the generic URI syntax in RFC3986 as much as possible. Specifically, if it doesn't recognize the scheme, it shouldn't arbitrarily disallow behavior like removing or adding an authority component. Thoughts/comments? Cheers, kats
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Kartikaya Gupta wrote: For unknown schemes, if the authority starts with //, doesn't it make sense to assume that the scheme allows an authority? I would assume that for an unknown scheme, the generic URI syntax in RFC3986 should be followed, which would interpret the stuff between // and the following / as the authority. This is an option, but it's not obviously correct, just as it's not obviously correct (and in fact would break pages) to parse http:foo.com/ without an authority. I'm reluctant to change any behavior here unless there's a spec, along with some data indicating the reasons for that spec and its impact on website compat. - Attempts to set protocol to null, the empty string, or anything containing invalid characters (i.e. not in the scheme production of RFC3986) should throw. Setting it to anything else should be allowed and should update the scheme component of the underlying URI. - Attempts to set host to null for a scheme known to require an authority should throw. For all other schemes (i.e. ones that do not require an authority, or unknown schemes) setting host to null should remove the authority component of the underlying URI. For all schemes, setting the host to anything else should be allowed (invalid characters are escaped) and should update the authority component of the underlying URI. - Attempts to set hostname should behave the same as setting host, except that in cases where the authority is updated with a new value (this excludes the case where the authority is being removed), the old port (if any) should be preserved. - Any attempt to set port when the host is null (i.e. there is no authority component in the underlying URI) should throw. If there is a non-null host, then: (1) setting port to null should remove the port subcomponent from the underlying URI if there is one, (2) setting port to the empty string or invalid characters should throw, and (3) setting port to a valid port string should update the port subcomponent of the underlying URI. - Attempts to set pathname to null should throw, since the path is a required component of a URI. Setting pathname to anything else should be allowed and should update the path component of the underlying URI (invalid characters are escaped). - Attempts to set search to null should remove the query component from the underlying URI, setting it to anything else is allowed and should update the query component of the underlying URI (invalid characters are escaped). - Attempts to set hash to null should remove the fragment component from the underlying URI, setting it to anything else is allowed and should update the fragment component of the underlying URI (invalid characters are escaped). - In all cases, undefined should be treated as null. (i.e. [Undefined=Null, Null=Null] in WebIDL) These are all more or less unacceptable. Foe example, setting pathname to empty string should work just fine, imo; setting that on http://foo.com/bar/; should result in http://foo.com/;. There are big scary comments in the Gecko code for these setters saying that they must never ever throw. I suspect that making them throw would be a serious web compat issue. Changing from an authority to a non-authority URI or the other way around doesn't seem desirable to me (and would only work for unknown schemes anyway, presumably, at best; it's better if it just never works). - In general I made every invalid action throw rather than ignoring the attempt because I personally don't like it when things fail silently. That's nice, but I suspect web sites rely on the silent fail behavior here. - In cases where the scheme is unknown I think the behavior should be such that it follows the generic URI syntax in RFC3986 as much as possible. Specifically, if it doesn't recognize the scheme, it shouldn't arbitrarily disallow behavior like removing or adding an authority component. Since for any given scheme the component is either allowed or not, it doesn't make sense to do that, to me... -Boris
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Fri, 27 Mar 2009 22:40:08 +0100, Boris Zbarsky bzbar...@mit.edu wrote: This is an option, but it's not obviously correct, just as it's not obviously correct (and in fact would break pages) to parse http:foo.com/ without an authority. Which pages would break? That URL does not work in Opera. -- Anne van Kesteren http://annevankesteren.nl/
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Anne van Kesteren wrote: On Fri, 27 Mar 2009 22:40:08 +0100, Boris Zbarsky bzbar...@mit.edu wrote: This is an option, but it's not obviously correct, just as it's not obviously correct (and in fact would break pages) to parse http:foo.com/ without an authority. Which pages would break? That URL does not work in Opera. Hmm. Interesting. I seemed to recall a number of bugs on this issue, and it looks like we did use to have them. The issue was that sites actually expected http:/foo and http:foo to be treated as _relative_ URIs equivalent to /foo and foo, because that's what some of the early URI RFCs defined them to be. Relevant bug comments are https://bugzilla.mozilla.org/show_bug.cgi?id=196088#c11 and https://bugzilla.mozilla.org/show_bug.cgi?id=142280#c1 I just tested, and the above two URIs are handled like http://foo in Webkit and Gecko, as relative URIs equivalent to replacing the filename by http:/foo and http:foo in Opera. I can't tell what IE is doing, since it loads absolutely nothing when such a URI is clicked over here (the document the link is in is at a file:// URI). Fun times. ;) -Boris
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Fri, 27 Mar 2009 17:40:08 -0400, Boris Zbarsky bzbar...@mit.edu wrote: Kartikaya Gupta wrote: - Attempts to set pathname to null should throw, since the path is a required component of a URI. Setting pathname to anything else should be allowed and should update the path component of the underlying URI (invalid characters are escaped). These are all more or less unacceptable. Foe example, setting pathname to empty string should work just fine, imo; setting that on http://foo.com/bar/; should result in http://foo.com/;. The empty string falls under the anything else case in my suggestion above and would work as you expect. There are big scary comments in the Gecko code for these setters saying that they must never ever throw. I suspect that making them throw would be a serious web compat issue. Is this Gecko-internal code you're referring to? Or the setters exposed to web content via HTMLAnchorElement? And do you have any examples of websites that would break if they threw? Changing from an authority to a non-authority URI or the other way around doesn't seem desirable to me (and would only work for unknown schemes anyway, presumably, at best; it's better if it just never works). Does it matter? Since it's an unknown scheme, it's basically opaque data. You can't dereference it and fetch the resource it points to, so is there an actual benefit from restricting the behavior? - In general I made every invalid action throw rather than ignoring the attempt because I personally don't like it when things fail silently. That's nice, but I suspect web sites rely on the silent fail behavior here. Examples? That being said, I'd be fine with changing them all to do the silent ignore thing instead of throwing if it turns out that throwing would break a lot of stuff. Cheers, kats
[whatwg] URL decomposition on HTMLAnchorElement interface
It seems that major browsers all support URL decomposition on HTMLAnchorElement, but this doesn't seem to be stated anywhere in the HTML5 spec. The jQuery/tabs library seems to depend on this (specifically, on the hash property) being available. Could the HTMLAnchorElement interface be updated to reflect this? Cheers, kats
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Browsers also support partially setting each of the url fields separately, although error handling between all of them is very inconsistent. Note: if you specify this behavior, then you need to specify what happens for http:, https:, data:, mailto: and unknown: On Thu, 26 Mar 2009 19:32:46 +0100, Kartikaya Gupta lists.wha...@stakface.com wrote: It seems that major browsers all support URL decomposition on HTMLAnchorElement, but this doesn't seem to be stated anywhere in the HTML5 spec. The jQuery/tabs library seems to depend on this (specifically, on the hash property) being available. Could the HTMLAnchorElement interface be updated to reflect this? Cheers, kats -- João Eiras Core Developer, Opera Software ASA, http://www.opera.com/
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
João Eiras wrote: Browsers also support partially setting each of the url fields separately, although error handling between all of them is very inconsistent. Note: if you specify this behavior, then you need to specify what happens for http:, https:, data:, mailto: and unknown: If you specify the setters then you also need to specify how this affects the value of the href attribute in the DOM. For example, in Gecko if you have an a href=foo#bar which has base URI http://example.com/; and you set anchor.hash on that anchor to baz, then the attribute value is changed to http://example.com/foo#baz;. I can't speak to what happens in other browsers. -Boris
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
Boris wrote: If you specify the setters then you also need to specify how this affects the value of the href attribute in the DOM. For example, in Gecko if you have an a href=foo#bar which has base URI http://example.com/; and you set anchor.hash on that anchor to baz, then the attribute value is changed to http://example.com/foo#baz;. I was trying different things to see what happens and came across some particularly weird behavior in Gecko/2009021910 Firefox/3.0.7: var a = document.createElement('a'); a.setAttribute('href', 'http://example.org:123/foo?bar#baz'); a.hostname = null; alert(a.hostname); // displays foo alert(a.href); // displays http://foo/?bar#baz; a.setAttribute('href', 'scheme://host/path'); a.host = null; alert(a.host); // displays alert(a.pathname); // displays alert(a.href); // displays scheme:host/path This behavior seems rather inconsistent and possibly buggy. I tried looking in Bugzilla to see if anything turned up but my search keywords just kept hitting a lot of unrelated stuff so I didn't try too hard. Cheers, kats
Re: [whatwg] URL decomposition on HTMLAnchorElement interface
On Thu, Mar 26, 2009 at 5:26 PM, Kartikaya Gupta This behavior seems rather inconsistent and possibly buggy. At first look I also thought it is inconsistent But later I found Firefox is very consistent. I think reason why it happening like that is because Firefox clean up URL by removing extra slash before host name and adding a slash after host name and also convert host name to lowercase. Try this var a = document.createElement('a'); a.setAttribute('href', 'http:/Example.org:123/foo?bar#baz'); //Case 1 alert(a.href); a.setAttribute('href', 'http:example.org:123/foo?bar#baz');//Case 2 alert(a.href); a.setAttribute('href', 'http:///example.org:123/foo?bar#baz');//Case 3 alert(a.href); a.setAttribute('href', 'http:/example.org:123/foo?bar#baz');//Case 4 alert(a.href); Firefox clean up the URL and all shows http://example.org:123/foo?bar#baz; So now when you set host as null, I ASSUME following is happening http://example.org:123/foo?bar#baz; === http://blank/foo?bar#baz === http:///foo?bar#baz; === http://foo/?bar#baz; Firefox do this same for protocols http, https, ftp for others it wont allow hostname change. Setting a.hash = null; a.search = null; are allowed for http, https, ftp, file and jar (may be for data: also, I have not tested it) You can use a null string instead of null. And I know host name can not be set to space or a string containing space. But it is allowing invalid characters like !$%^*( etc. Get confused when it find @#? as hostname Now question is do we need to allow to set host to a null or ? PS: Jar protocol example jar:http://example.org:123/foo!/?bar#baz;