Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-04-28 Thread Ian Hickson
On Thu, 26 Mar 2009, Kartikaya Gupta wrote:

 It seems that major browsers all support URL decomposition on 
 HTMLAnchorElement, but this doesn't seem to be stated anywhere in the 
 HTML5 spec. The jQuery/tabs library seems to depend on this 
 (specifically, on the hash property) being available. Could the 
 HTMLAnchorElement interface be updated to reflect this?

Done.


On Thu, 26 Mar 2009, João Eiras wrote:

 Browsers also support partially setting each of the url fields 
 separately, although error handling between all of them is very 
 inconsistent. Note: if you specify this behavior, then you need to 
 specify what happens for http:, https:, data:, mailto: and unknown:

Done.

Browsers differ in how this is handled; the spec doesn't quite match any 
of them (it takes the most sane aspects of each browser I tested).


On Thu, 26 Mar 2009, Boris Zbarsky wrote:
 
 If you specify the setters then you also need to specify how this 
 affects the value of the href attribute in the DOM.  For example, in 
 Gecko if you have an a href=foo#bar which has base URI 
 http://example.com/; and you set anchor.hash on that anchor to baz, 
 then the attribute value is changed to http://example.com/foo#baz;.  I 
 can't speak to what happens in other browsers.

Done.


On Thu, 26 Mar 2009, Kartikaya Gupta wrote:
 
 var a = document.createElement('a');
 a.setAttribute('href', 'http://example.org:123/foo?bar#baz');
 a.hostname = null;
 alert(a.hostname);   // displays foo
 alert(a.href);   // displays http://foo/?bar#baz;

The spec says null and http://null:123/foo?bar#baz;.

If WebIDL changes to say that 'null' becomes , then the spec says 
example.org and http://example.org:123/foo?bar#baz; (setting 'host' or 
'hostname' to the empty string is ignored).


 a.setAttribute('href', 'scheme://host/path');
 a.host = null;
 alert(a.host);   // displays 
 alert(a.pathname);   // displays 
 alert(a.href);   // displays scheme:host/path

If 'null' becomes null: null, /path, and scheme://null/path.

If 'null' becomes : host, /path, and scheme://host/path (setting 
'host' or 'hostname' to the empty string is ignored).


On Thu, 26 Mar 2009, Biju wrote:
 
 var a = document.createElement('a');

Assuming a base URL of http://example.com/path/:

 a.setAttribute('href', 'http:/Example.org:123/foo?bar#baz');   //Case 1
 alert(a.href);

Per spec: http://example.com/Example.org:123/foo?bar#baz;


 a.setAttribute('href', 'http:example.org:123/foo?bar#baz');//Case 2
 alert(a.href);

Per spec: http://example.com/path/Example.org:123/foo?bar#baz;


 a.setAttribute('href', 'http:///example.org:123/foo?bar#baz');//Case 3
 alert(a.href);

Per spec:  (the URL can't be parsed).


 a.setAttribute('href', 'http:/example.org:123/foo?bar#baz');//Case 4
 alert(a.href);

Per spec:  (the URL can't be parsed).

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-28 Thread Kartikaya Gupta
On Fri, 27 Mar 2009 21:53:48 -0400, Boris Zbarsky bzbar...@mit.edu wrote:
 Kartikaya Gupta wrote:
  The empty string falls under the anything else case in my suggestion 
  above and would work as you expect.
 
 Null and empty string should, imo, have the same behavior here.  It 
 doesn't make sense to treat them differently to me.

Do you agree that null and empty string should behave differently for search 
and hash? To me, it doesn't make sense to treat null and the empty string 
differently for some components but not others.

 
   There are big scary comments in the Gecko code for these setters saying 
   that they must never ever throw.  I suspect that making them throw would 
   be a serious web compat issue.
  
  Is this Gecko-internal code you're referring to? Or the setters exposed to 
  web 
  content via HTMLAnchorElement?
 
 The latter.  The Gecko-internal URI code does in fact throw on a lot of 
 these setters, and the HTMLAnchorElement methods catch and eat these 
 exceptions, very much on purpose.

Ok. I'll assume there is valid reasoning behind that. Replace all the throws 
with be silently ignored in my proposal.

   Changing from an authority to a non-authority URI or the other way 
   around doesn't seem desirable to me (and would only work for unknown 
   schemes anyway, presumably, at best; it's better if it just never works).
  
  Does it matter? Since it's an unknown scheme, it's basically opaque data. 
  You can't 
  dereference it and fetch the resource it points to
 
 No, but you can pass it off to a helper application.  In any case, my 
 comment above was more concerned with your proposal that one should be 
 able to create a non-authority http: URI than about unknown schemes.

I don't think my proposal allowed creation of a non-authority http: URI. I said 
that 'Attempts to set host to null for a scheme known to require an authority 
should throw.' Since http is a scheme known to require an authority, you 
wouldn't be able to null out the authority. The one loophole I missed would be 
to create a non-http non-authority URI and then change the scheme to http. That 
can be fixed by amending the first sentence of my proposal to the following:

- Attempts to set protocol to null, the empty string, or anything containing 
invalid characters (i.e. not in the scheme production of RFC3986) should 
throw. Setting it to a scheme known to require an authority when the authority 
component is null should also throw. Setting it to a scheme known to require no 
authority when the authority component is non-null should also throw. Setting 
it to anything else should be allowed and should update the scheme component of 
the underlying URI.

(With appropriate adjustments of s/throw/be silently ignored/)

Cheers,
kats


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-28 Thread Boris Zbarsky

Kartikaya Gupta wrote:

Do you agree that null and empty string should behave differently for search and 
hash?


No, not really.  Are they treated differently in current UAs (past null 
being treated as null, possibly)?



To me, it doesn't make sense to treat null and the empty string differently for 
some components but not others.


Agreed that it would be confusing from a web developer point of view. 
Of course from a URI point of view some URI components can be empty but 
present or can be not present at all, as you point out above.


The latter.  The Gecko-internal URI code does in fact throw on a lot of 
these setters, and the HTMLAnchorElement methods catch and eat these 
exceptions, very much on purpose.


Ok. I'll assume there is valid reasoning behind that. Replace all the throws with 
be silently ignored in my proposal.


For what it's worth, I suspect that the silent fail is somewhat 
interoperably implemented already.


No, but you can pass it off to a helper application.  In any case, my 
comment above was more concerned with your proposal that one should be 
able to create a non-authority http: URI than about unknown schemes.


I don't think my proposal allowed creation of a non-authority http: URI. I said that 
'Attempts to set host to null for a scheme known to require an authority 
should throw.' Since http is a scheme known to require an authority, you wouldn't be able 
to null out the authority.


Or set it to the empty string, which has the same effect.  Your proposal 
treats those differently.



- Attempts to set protocol to null, the empty string, or anything containing invalid 
characters (i.e. not in the scheme production of RFC3986) should throw. Setting it to a 
scheme known to require an authority when the authority component is null should also throw. 
Setting it to a scheme known to require no authority when the authority component is non-null 
should also throw. Setting it to anything else should be allowed and should update the scheme 
component of the underlying URI.


Honestly, I can't think of a sane way to define a protocol setter that 
changes from one URI type to another (type being has authority, 
doesn't have authority, unknown).  Actually, as far as Gecko is 
concerned there are 5 different types; see the three constants defined 
at 
http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsIStandardURL.idl#56 
for three, plus not a hierarchichal URI (which never does the fixup 
from scheme:foo to scheme://foo) and unknown (treated like not a 
hierarchical URI).


-Boris


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-28 Thread João Eiras
 Interestingly, it looks like Opera doesn't support the hostname setter
 at all.  Safari ignores the call in this case.  I don't have IE to test
 offhand.


True. Opera currently does not support setting these values separately.






Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Kartikaya Gupta
On Thu, 26 Mar 2009 23:01:34 -0400, Biju bijumaill...@gmail.com wrote:
 On Thu, Mar 26, 2009 at 5:26 PM, Kartikaya Gupta
  This behavior seems rather inconsistent and possibly buggy.
 
 At first look I also thought it is inconsistent
 But later I found Firefox is very consistent.
 I think reason why it happening like that is because Firefox clean up
 URL by removing extra slash before host name
 and adding a slash after host name and also convert host name to lowercase.
 

Well, yes, I'm sure there is a simple set of rules that will explain this 
behavior from the implementation point of view. However, it is inconsistent for 
the average user.

The behavior where nulling out the hostname causes the first path component to 
become the hostname is particularly odd, IMO. I can't think of any use of this 
behavior that would be considered anything other than an ugly hack.

Cheers,
kats


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Jonas Sicking
On Fri, Mar 27, 2009 at 4:55 AM, Kartikaya Gupta
lists.wha...@stakface.com wrote:
 On Thu, 26 Mar 2009 23:01:34 -0400, Biju bijumaill...@gmail.com wrote:
 On Thu, Mar 26, 2009 at 5:26 PM, Kartikaya Gupta
  This behavior seems rather inconsistent and possibly buggy.

 At first look I also thought it is inconsistent
 But later I found Firefox is very consistent.
 I think reason why it happening like that is because Firefox clean up
 URL by removing extra slash before host name
 and adding a slash after host name and also convert host name to lowercase.


 Well, yes, I'm sure there is a simple set of rules that will explain this 
 behavior from the implementation point of view. However, it is inconsistent 
 for the average user.

 The behavior where nulling out the hostname causes the first path component 
 to become the hostname is particularly odd, IMO. I can't think of any use of 
 this behavior that would be considered anything other than an ugly hack.

What would you suggest should happen instead?

I don't see a reason why we wouldn't be ok with changing how firefox
behaves here, but discussions about better ways of doing it are a lot
more productive than discussions about how bad the current behavior
is.

/ Jonas


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Jonas Sicking
On Fri, Mar 27, 2009 at 11:02 AM, Kristof Zelechovski
giecr...@stegny.2a.pl wrote:
 Instead of setting the host name of a hyperreference to null, use the host
 name (of the base) of the current document instead.

That seems pretty arbitrary. How about throwing or setting the whole
href to null instead.

/ Jonas


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Boris Zbarsky

Kartikaya Gupta wrote:

I was trying different things to see what happens and came across some 
particularly weird behavior in Gecko/2009021910 Firefox/3.0.7:


 var a = document.createElement('a');
 a.setAttribute('href', 'http://example.org:123/foo?bar#baz');
 a.hostname = null;
 alert(a.hostname);   // displays foo
 alert(a.href);   // displays http://foo/?bar#baz;

Indeed.  The behavior you're seeing is due setting the hostname to the 
empty string, basically...  That said, this code should probably bail 
out when that happens instead of pressing on.  I've filed 
https://bugzilla.mozilla.org/show_bug.cgi?id=485562 on this.


Interestingly, it looks like Opera doesn't support the hostname setter 
at all.  Safari ignores the call in this case.  I don't have IE to test 
offhand.




a.setAttribute('href', 'scheme://host/path');
a.host = null;
alert(a.host);   // displays 
alert(a.pathname);   // displays 
alert(a.href);   // displays scheme:host/path


This case is more fun.  It's an unknown scheme, so it's assumed to be a 
no-authority non-hierarchical scheme and the URI is parsed that way. 
This does cause issues, since RFC 3986 says that i there is no authority 
then the path cannot begin with two slashes (so if scheme is a 
non-authority protocol then the URI is invalid, in fact).  But deciding 
whether this is an invalid URI or not involves knowing something about 
the scheme protocol, which is rather hard in this case, since you just 
made it up.  ;)


In general, parsing a URI for a scheme you know nothing about is a huge 
pain, especially if your URL parser is expected to do fixup on invalid 
URIs (which the parser for the href attribute of a is certainly 
expected to do).


-Boris


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Kartikaya Gupta
On Fri, 27 Mar 2009 14:14:35 -0400, Boris Zbarsky bzbar...@mit.edu wrote:
 
 This case is more fun.  It's an unknown scheme, so it's assumed to be a 
 no-authority non-hierarchical scheme and the URI is parsed that way. 
 This does cause issues, since RFC 3986 says that i there is no authority 
 then the path cannot begin with two slashes (so if scheme is a 
 non-authority protocol then the URI is invalid, in fact).  But deciding 
 whether this is an invalid URI or not involves knowing something about 
 the scheme protocol, which is rather hard in this case, since you just 
 made it up.  ;)

For unknown schemes, if the authority starts with //, doesn't it make sense 
to assume that the scheme allows an authority? I would assume that for an 
unknown scheme, the generic URI syntax in RFC3986 should be followed, which 
would interpret the stuff between // and the following / as the authority.

On Fri, 27 Mar 2009 10:49:41 -0700, Jonas Sicking jo...@sicking.cc wrote:
 What would you suggest should happen instead?
 
 I don't see a reason why we wouldn't be ok with changing how firefox
 behaves here, but discussions about better ways of doing it are a lot
 more productive than discussions about how bad the current behavior
 is.
 

Agreed. How about the following:

- Attempts to set protocol to null, the empty string, or anything containing 
invalid characters (i.e. not in the scheme production of RFC3986) should 
throw. Setting it to anything else should be allowed and should update the 
scheme component of the underlying URI.
- Attempts to set host to null for a scheme known to require an authority 
should throw. For all other schemes (i.e. ones that do not require an 
authority, or unknown schemes) setting host to null should remove the 
authority component of the underlying URI. For all schemes, setting the host to 
anything else should be allowed (invalid characters are escaped) and should 
update the authority component of the underlying URI.
- Attempts to set hostname should behave the same as setting host, except 
that in cases where the authority is updated with a new value (this excludes 
the case where the authority is being removed), the old port (if any) should be 
preserved.
- Any attempt to set port when the host is null (i.e. there is no authority 
component in the underlying URI) should throw. If there is a non-null host, 
then: (1) setting port to null should remove the port subcomponent from the 
underlying URI if there is one, (2) setting port to the empty string or 
invalid characters should throw, and (3) setting port to a valid port string 
should update the port subcomponent of the underlying URI.
- Attempts to set pathname to null should throw, since the path is a required 
component of a URI. Setting pathname to anything else should be allowed and 
should update the path component of the underlying URI (invalid characters are 
escaped).
- Attempts to set search to null should remove the query component from the 
underlying URI, setting it to anything else is allowed and should update the 
query component of the underlying URI (invalid characters are escaped).
- Attempts to set hash to null should remove the fragment component from the 
underlying URI, setting it to anything else is allowed and should update the 
fragment component of the underlying URI (invalid characters are escaped).
- In all cases, undefined should be treated as null. (i.e. [Undefined=Null, 
Null=Null] in WebIDL)

Notes:
- In general I made every invalid action throw rather than ignoring the attempt 
because I personally don't like it when things fail silently.
- I think that null should not be stringified to null because for some of the 
components setting to null makes sense, and I prefer it all components are 
consistent with respect to stringification.
- In cases where the scheme is unknown I think the behavior should be such that 
it follows the generic URI syntax in RFC3986 as much as possible. Specifically, 
if it doesn't recognize the scheme, it shouldn't arbitrarily disallow behavior 
like removing or adding an authority component.

Thoughts/comments?

Cheers,
kats


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Boris Zbarsky

Kartikaya Gupta wrote:

For unknown schemes, if the authority starts with //, doesn't it make sense to assume that the 
scheme allows an authority? I would assume that for an unknown scheme, the generic URI syntax in RFC3986 
should be followed, which would interpret the stuff between // and the following / as 
the authority.



This is an option, but it's not obviously correct, just as it's not 
obviously correct (and in fact would break pages) to parse 
http:foo.com/ without an authority.


I'm reluctant to change any behavior here unless there's a spec, along 
with some data indicating the reasons for that spec and its impact on 
website compat.



- Attempts to set protocol to null, the empty string, or anything containing invalid 
characters (i.e. not in the scheme production of RFC3986) should throw. Setting it to 
anything else should be allowed and should update the scheme component of the underlying URI.
- Attempts to set host to null for a scheme known to require an authority should throw. 
For all other schemes (i.e. ones that do not require an authority, or unknown schemes) setting 
host to null should remove the authority component of the underlying URI. For all 
schemes, setting the host to anything else should be allowed (invalid characters are escaped) and 
should update the authority component of the underlying URI.
- Attempts to set hostname should behave the same as setting host, except 
that in cases where the authority is updated with a new value (this excludes the case where the 
authority is being removed), the old port (if any) should be preserved.
- Any attempt to set port when the host is null (i.e. there is no authority component in the underlying URI) should 
throw. If there is a non-null host, then: (1) setting port to null should remove the port subcomponent from the 
underlying URI if there is one, (2) setting port to the empty string or invalid characters should throw, and (3) setting 
port to a valid port string should update the port subcomponent of the underlying URI.
- Attempts to set pathname to null should throw, since the path is a required component 
of a URI. Setting pathname to anything else should be allowed and should update the 
path component of the underlying URI (invalid characters are escaped).
- Attempts to set search to null should remove the query component from the 
underlying URI, setting it to anything else is allowed and should update the query 
component of the underlying URI (invalid characters are escaped).
- Attempts to set hash to null should remove the fragment component from the 
underlying URI, setting it to anything else is allowed and should update the fragment 
component of the underlying URI (invalid characters are escaped).
- In all cases, undefined should be treated as null. (i.e. [Undefined=Null, 
Null=Null] in WebIDL)


These are all more or less unacceptable.  Foe example, setting 
pathname to empty string should work just fine, imo; setting that on 
http://foo.com/bar/; should result in http://foo.com/;.


There are big scary comments in the Gecko code for these setters saying 
that they must never ever throw.  I suspect that making them throw would 
be a serious web compat issue.


Changing from an authority to a non-authority URI or the other way 
around doesn't seem desirable to me (and would only work for unknown 
schemes anyway, presumably, at best; it's better if it just never works).



- In general I made every invalid action throw rather than ignoring the attempt 
because I personally don't like it when things fail silently.


That's nice, but I suspect web sites rely on the silent fail behavior here.


- In cases where the scheme is unknown I think the behavior should be such that 
it follows the generic URI syntax in RFC3986 as much as possible. Specifically, 
if it doesn't recognize the scheme, it shouldn't arbitrarily disallow behavior 
like removing or adding an authority component.


Since for any given scheme the component is either allowed or not, it 
doesn't make sense to do that, to me...


-Boris


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Anne van Kesteren

On Fri, 27 Mar 2009 22:40:08 +0100, Boris Zbarsky bzbar...@mit.edu wrote:
This is an option, but it's not obviously correct, just as it's not  
obviously correct (and in fact would break pages) to parse  
http:foo.com/ without an authority.


Which pages would break? That URL does not work in Opera.


--
Anne van Kesteren
http://annevankesteren.nl/


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Boris Zbarsky

Anne van Kesteren wrote:

On Fri, 27 Mar 2009 22:40:08 +0100, Boris Zbarsky bzbar...@mit.edu wrote:
This is an option, but it's not obviously correct, just as it's not 
obviously correct (and in fact would break pages) to parse 
http:foo.com/ without an authority.


Which pages would break? That URL does not work in Opera.


Hmm.  Interesting.  I seemed to recall a number of bugs on this issue, 
and it looks like we did use to have them.  The issue was that sites 
actually expected http:/foo and http:foo to be treated as _relative_ 
URIs equivalent to /foo and foo, because that's what some of the 
early URI RFCs defined them to be.


Relevant bug comments are 
https://bugzilla.mozilla.org/show_bug.cgi?id=196088#c11 and 
https://bugzilla.mozilla.org/show_bug.cgi?id=142280#c1


I just tested, and the above two URIs are handled like http://foo in 
Webkit and Gecko, as relative URIs equivalent to replacing the filename 
by http:/foo and http:foo in Opera. I can't tell what IE is doing, 
since it loads absolutely nothing when such a URI is clicked over here 
(the document the link is in is at a file:// URI).


Fun times.  ;)

-Boris


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-27 Thread Kartikaya Gupta
On Fri, 27 Mar 2009 17:40:08 -0400, Boris Zbarsky bzbar...@mit.edu wrote:
 Kartikaya Gupta wrote:
  - Attempts to set pathname to null should throw, since the path is a 
  required 
  component of a URI. Setting pathname to anything else should be allowed 
  and 
  should update the path component of the underlying URI (invalid characters 
  are 
  escaped).
 
 These are all more or less unacceptable.  Foe example, setting 
 pathname to empty string should work just fine, imo; setting that on 
 http://foo.com/bar/; should result in http://foo.com/;.
 

The empty string falls under the anything else case in my suggestion above 
and would work as you expect.

 There are big scary comments in the Gecko code for these setters saying 
 that they must never ever throw.  I suspect that making them throw would 
 be a serious web compat issue.

Is this Gecko-internal code you're referring to? Or the setters exposed to web 
content via HTMLAnchorElement? And do you have any examples of websites that 
would break if they threw?

 Changing from an authority to a non-authority URI or the other way 
 around doesn't seem desirable to me (and would only work for unknown 
 schemes anyway, presumably, at best; it's better if it just never works).

Does it matter? Since it's an unknown scheme, it's basically opaque data. You 
can't dereference it and fetch the resource it points to, so is there an actual 
benefit from restricting the behavior?

  - In general I made every invalid action throw rather than ignoring the 
  attempt 
  because I personally don't like it when things fail silently.
 
 That's nice, but I suspect web sites rely on the silent fail behavior here.

Examples? That being said, I'd be fine with changing them all to do the silent 
ignore thing instead of throwing if it turns out that throwing would break a 
lot of stuff.

Cheers,
kats


[whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-26 Thread Kartikaya Gupta
It seems that major browsers all support URL decomposition on 
HTMLAnchorElement, but this doesn't seem to be stated anywhere in the HTML5 
spec. The jQuery/tabs library seems to depend on this (specifically, on the 
hash property) being available. Could the HTMLAnchorElement interface be 
updated to reflect this?

Cheers,
kats


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-26 Thread João Eiras
Browsers also support partially setting each of the url fields separately, 
although error handling between all of them is very inconsistent.
Note: if you specify this behavior, then you need to specify what happens for 
http:, https:, data:, mailto: and unknown:


On Thu, 26 Mar 2009 19:32:46 +0100, Kartikaya Gupta lists.wha...@stakface.com 
wrote:

 It seems that major browsers all support URL decomposition on 
 HTMLAnchorElement, but this doesn't seem to be stated anywhere in the HTML5 
 spec. The jQuery/tabs library seems to depend on this (specifically, on the 
 hash property) being available. Could the HTMLAnchorElement interface be 
 updated to reflect this?

 Cheers,
 kats
 


-- 

João Eiras
Core Developer, Opera Software ASA, http://www.opera.com/


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-26 Thread Boris Zbarsky

João Eiras wrote:

Browsers also support partially setting each of the url fields separately, 
although error handling between all of them is very inconsistent.
Note: if you specify this behavior, then you need to specify what happens for 
http:, https:, data:, mailto: and unknown:


If you specify the setters then you also need to specify how this 
affects the value of the href attribute in the DOM.  For example, in 
Gecko if you have an a href=foo#bar which has base URI 
http://example.com/; and you set anchor.hash on that anchor to baz, 
then the attribute value is changed to http://example.com/foo#baz;.  I 
can't speak to what happens in other browsers.


-Boris


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-26 Thread Kartikaya Gupta
Boris wrote:
 If you specify the setters then you also need to specify how this 
 affects the value of the href attribute in the DOM.  For example, in 
 Gecko if you have an a href=foo#bar which has base URI 
 http://example.com/; and you set anchor.hash on that anchor to baz, 
 then the attribute value is changed to http://example.com/foo#baz;.

I was trying different things to see what happens and came across some 
particularly weird behavior in Gecko/2009021910 Firefox/3.0.7:

var a = document.createElement('a');
a.setAttribute('href', 'http://example.org:123/foo?bar#baz');
a.hostname = null;
alert(a.hostname);   // displays foo
alert(a.href);   // displays http://foo/?bar#baz;

a.setAttribute('href', 'scheme://host/path');
a.host = null;
alert(a.host);   // displays 
alert(a.pathname);   // displays 
alert(a.href);   // displays scheme:host/path

This behavior seems rather inconsistent and possibly buggy. I tried looking in 
Bugzilla to see if anything turned up but my search keywords just kept hitting 
a lot of unrelated stuff so I didn't try too hard.

Cheers,
kats


Re: [whatwg] URL decomposition on HTMLAnchorElement interface

2009-03-26 Thread Biju
On Thu, Mar 26, 2009 at 5:26 PM, Kartikaya Gupta
 This behavior seems rather inconsistent and possibly buggy.

At first look I also thought it is inconsistent
But later I found Firefox is very consistent.
I think reason why it happening like that is because Firefox clean up
URL by removing extra slash before host name
and adding a slash after host name and also convert host name to lowercase.

Try this

var a = document.createElement('a');
a.setAttribute('href', 'http:/Example.org:123/foo?bar#baz');   //Case 1
alert(a.href);
a.setAttribute('href', 'http:example.org:123/foo?bar#baz');//Case 2
alert(a.href);
a.setAttribute('href', 'http:///example.org:123/foo?bar#baz');//Case 3
alert(a.href);
a.setAttribute('href', 'http:/example.org:123/foo?bar#baz');//Case 4
alert(a.href);

Firefox clean up the URL
and all shows http://example.org:123/foo?bar#baz;

So now when you set host as null, I ASSUME following is happening

http://example.org:123/foo?bar#baz;
===
http://blank/foo?bar#baz
===
http:///foo?bar#baz;
===
http://foo/?bar#baz;


Firefox do this same for protocols http, https, ftp for others it wont
allow hostname change.

Setting
a.hash = null;
a.search = null;
are allowed for http, https, ftp, file and jar
(may be for data: also, I have not tested it)

You can use a null string instead of null.
And I know host name can not be set to space or a string containing space.
But it is allowing invalid characters like !$%^*( etc.
Get confused when it find @#? as hostname

Now question is do we need to allow to set host to a null or ?

PS: Jar protocol example
jar:http://example.org:123/foo!/?bar#baz;