Re: [whatwg] Use cases for Node.getElementById (was: Re: Early feedback on header association algorithm)

2008-12-07 Thread João Eiras


IMO, anyone suggesting a Node.getElementById clearly does not know very  
well how getElementById is supposed to work.
There are ways to transverse a DOM tree currently, either DOM properties  
and methods, XPath, selectors API and such.
Considering ids are required to be unique in the context of a single  
document, implementations can, and do, implement id lookup using optimized  
data structures like a hash table, which is much more performant than  
doing transversal.
So if there is a special node in a document, add an id to it and get its  
reference will be performant (ideally O(1)).


If the uniqueness requirement is removed, then getElementById looses its  
whole meaning and should actually be removed from the specification  
entirely, else then we would need more bloat like getElementById or  
getElementListById and whatever.


If you really need to get the element with id in a subtree, connected or  
disconnected from the main tree, one can use selectors API, DOM  
transversal, XPath, etc.


So, IMO the DOM spec is just fine. What is asked for Node.getElementById  
is already supported by other APIs.




Re: [whatwg] Use cases for Node.getElementById

2008-12-07 Thread Calogero Alex Baldacchino

João Eiras ha scritto:


IMO, anyone suggesting a Node.getElementById clearly does not know 
very well how getElementById is supposed to work.
There are ways to transverse a DOM tree currently, either DOM 
properties and methods, XPath, selectors API and such.
Considering ids are required to be unique in the context of a single 
document, implementations can, and do, implement id lookup using 
optimized data structures like a hash table, which is much more 
performant than doing transversal.
So if there is a special node in a document, add an id to it and get 
its reference will be performant (ideally O(1)).


Such a hash table cannot prevent at all the need of traversing the DOM 
tree for the purpose of a _correct_ implementation of .getElementById. A 
DOM tree is a live structure, so the hash table must be checked and 
updated each time a node is removed AND each time a node is inserted, 
for a couple of reasons, and such update may request some kind of tree 
traversing (i.e. to compare nodes relative position). Actually, 
getElementById is being defined as returning the _first_ element with a 
matching ID, as a graceful degradation in case of duplicate IDs and to 
give a better standard (= unique) definition of the expected behavior in 
front of duplicate IDs, than what stated in DOM 3 Core (which leaves 
such behavior unspecified -- it's said to be undefined -- and possibly 
implementation or document specific); this means that, upon insertion of 
a new element, this one might be the new 'first' element with a certain 
id, so its order must be checked and the hash table updated accordingly. 
When an element is removed, independently of the previous scenario, if 
it was in the hash table it might be just removed from the table a well, 
but such wouldn't work fine, because there might be a descendant, or an 
otherwise following element with the same id: after the removal, such 
element would pass from the 'illegal' state of being a duplicate-ID 
element, to the 'legal' state of being the current element to be 
returned by getElementById = the existence of such an element must be 
checked and the hash table updated accordingly. If there are far more 
insertions and/or removals of elements with the id attribute set, than 
calls to getElementById, the advantage of a live hash table vs 
traversing as needed can be quite lost; anyway, a traversal can be quite 
fast, especially if the DOM structure is implemented as a balanced 
binary tree (and I hope you don't wish to implement any kind of 
non-binary tree as the base tree structure).




If the uniqueness requirement is removed, then getElementById looses 
its whole meaning and should actually be removed from the 
specification entirely, else then we would need more bloat like 
getElementById or getElementListById and whatever.


Do you thing that getElementsByTagName and getElementsByClassName are 
bloaty and useless too? However, my point was, and is, another (I'm not 
for Node.getElementById - nor I am strongly against it).




If you really need to get the element with id in a subtree, connected 
or disconnected from the main tree, one can use selectors API, DOM 
transversal, XPath, etc.


Currently, the id uniqueness is defined such as constraining not only a 
whole document, but also a disconnected subtree. Then, what API is such 
constraint relevant for? If none, is it worth to declare such constraint 
for disconnected subtrees? Or, is there any need for an API directly 
handling IDs in disconnected subtrees?


In other words, what's being constrained by the id uniqueness in a 
disconnected subtree? A disconnected subtree may be a subtree of another 
document, different from the one currently handled by a script; in this 
case, the id uniqueness is relevant for the actual document containing 
the subtree (while any other document shouldn't be affected by 
cross-document IDs clashes). Otherwise, it may be a subtree external to 
any document, and in such case, perhaps, it might be out of scope for 
HTML 5 documents specification. I'm starting to think that at most it 
might be said, for disconnected subtrees outside any actual html 
document but consisting of html elements, that any API dealing with 
unique identifiers in a disconnected subtree of html elements must treat 
the value of any such element's id attribute as the element default ID 
(the id value uniqueness being a consequence of both its nature as ID 
property and the nature of an API methods targeting an element ID 
property, but not imposed by the specifications, since currently there 
is no such method in the scope of HTML 5 DOM). As a consequence, the id 
value uniqueness might be in scope for a DOM Core specification 
explicitly willing to handle ID properties in a disconnected (and 
'document-less') subtree of Elements, just because the id value 
represent (at least) the first attribute of an HTML element to be 
evaluated looking for an ID property.


Regard, Alex.


--
Caselle da 

Re: [whatwg] Citing multiple blockquote elements in HTML5

2008-12-07 Thread Calogero Alex Baldacchino

Ian Hickson ha scritto:
What terminology would you prefer rather than subtree? (We can't say 
document, since we are also trying to define conformance rules for 
disconnected subtrees handled from scripts.)
  


I was thinking again on that. Let me suggest something like the 
following (and just do suggest, I'm far from wishing to impose my point 
of view, and don't want to be pedantic, but I belive deeply exploring 
every alternative may improve the specification).


The _id_ attribute represents an element unique identifier in the 
subtree within which the element finds itself and must contain at least 
one character. In this context, a subtree is either a whole document 
tree, or a tree of Node instances containing HTMLElements and 
disconnected from any HTML document; a subtree of a document tree is 
contained in a subtree of the first type, thus id values must be unique 
in the containing document (e.g. a duplicate id inside a document tree 
is always illegal, even if a branch of the document can be isolated 
where the id is unique, unless such branch is removed from the document).


This specification requires the _id_ attribute value to be unique in a 
subtree of the former type, thus a subtree of the latter type (e.g. a 
document fragment manipulated by a script) to be inserted into an HTML 
document must fulfil such requirement, as well as any other requirements 
defined in this specification for conformance purpose. Any API dealing 
with ID properties in any type of subtree must consider the _id_ 
attribute value of an HTMLElement as the element's default ID property; 
however, this specification doesn't preclude an element having multiple 
IDs, if other, API-specific mechanisms can set an element's ID in a way 
that doesn't conflict with the id attribute - then the rest.


One rational for the above is that, formally, a subtree disconnected 
from any actual HTML document might be out of scope for current 
specification, which defines conformance rules for HTML documents and 
related contexts (such as a script context or a browsing context, both 
applying to a 'connected' subtree, as far as I've understood), while a 
subtree which is disconnected from a specific HTML document, but is 
contained into another one (thus coinciding with the containing document 
tree) is yet covered by the constraint for whole documents.


Another rational is that current specification, while relying on at 
least one method affected by IDs uniqueness in a document tree (that is, 
DOM Core Document.getElementById), does not provide, nor refers to, any 
API which might be directly affected by the uniqueness of an id 
attribute value in a disconnected subtree, thus such an API may be 
indirectly related to id values uniqueness if ID properties are relevant 
for its facilities, but the subtree itself cannot be constrained by 
conformance rules before its insertion into an actual HTML document.


A further rational is that a disconnected subtree might contain Node 
instances not implementing the HTMLElement interface, such as a 
DocumentFragment node, but also MathML/SVG elements, which might be 
embedded content elements coming from an HTML document tree, but also 
from a document of a different kind where the embedded content was 
represented by HTML elements, thus, without a sure knowledge on the 
subtree origin, applying an HTML-specific conformance rule might not be 
a correct choice, until the subtree is to be inserted into an HTML document.


For the question related to space characters inside an id value, I'd 
suggest,


An ID property is not expected to contain space characters, so the 
value of an _id_ attribute should not contain any space characters. 
However, an id attribute can hold a decoded fragment identifier value 
for the purpose of same-document references, thus space characters are 
tolerated for the purpose of conformance, in order to avoid applying 
restrictions to an otherwise legal fragment identifier value not being 
part of a _URL_.


Everything, of course, IMHO.

Best regards,
Alex.


--
Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP 
autenticato? GRATIS solo con Email.it http://www.email.it/f

Sponsor:
CAPODANNO A RICCIONE
* Speciale Capodanno Bambini con Animazione e Baby Sitter.
* Un bimbo fino a 6 anni GRATIS.
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8503d=8-12


Re: [whatwg] Use cases for Node.getElementById

2008-12-07 Thread Simon Pieters
On Sun, 07 Dec 2008 04:09:01 +0100, Calogero Alex Baldacchino  
[EMAIL PROTECTED] wrote:



I'm reading it :-)

And I have a few questions. First, is it meant as the reference DOM Core  
for HTML 5 only, or in general (for other kinds of markup too)?


In general.


The 'children' attribute on the Element interface, being an  
HTMLCollection instance, suggests me the former might be the answer;  
otherwise, either the reference to a specific document DOM interface, or  
(in the case such interface were moved into Web DOM Core) the reference  
to a specific dom in the name of the interface might perhaps be  
problematic (formally, at least).


Is it the name HTMLCollection that is the problem?


I guess such attribute has been declared on the Element interface  
instead of the HTMLElement one because actually this is the most common  
implementation in current browsers.


Right. Also because it seems useful for not just HTML.


Anyway, let me suggest (just as a hint, after all a working draft is the  
right phase to explore any alternative) something like an  
ElementCollection interface with the same properties of HTMLCollection,  
making the latter just inheriting from the former as if it were an alias  
(the same way DocumentFragment inherits from Node). On any browser  
implementing 'children' as an HTMLCollection (without any hierarchy),  
this shouldn't be a problem for scripts, since a script language usually  
provides runtime inferred types; for languages with strong types (and  
perhaps here we're moving from scripts to plugins), the access strategy  
may be implementation specific but, as far as the hierarchy of  
interfaces (ElementCollection - HTMLCollection) does not change the  
properties of an object implementing the HTMLCollection, that shouldn't  
be a lot to work around. For instance, a Java applet (as well as any  
other object implementing LiveConnect) should work fine using the  
JSObject without any modify, while a direct access to the DOM would need  
a DOMServiceProvider implementation (I'm not aware of any granting  
access to the 'children' attribute, or better, to any non-W3C DOM  
properties, but I guess as soon as your proposal became a recommendation  
at least Sun would update such in Java APIs); for such purpose,  
suggesting that any object provided by the user agent as implementing  
either interface should be wrapped by an object also implementing the  
other, for backward compatibility, might be enough (anyway, this is no  
more than a hint, a very early feedback).


This seems like adding complexity for political reasons.


I see the Element interface no more contains methods to handle Attr  
nodes: since those are described as not being child nodes of an Element,  
in W3C specifications, there will be any other way to handle attributes  
as nodes, the 'nature' of Attr nodes is going to change, or is there a  
too little use (and/or support) of them, such that the Attr interface  
might be quite close to its 'end of life'?


I'm not sure what to do with attributes. I'd like to drop support for  
attribute nodes (being moved around, etc), if possible, but keep the  
.attributes list and be able to use .value etc on each attribute.



Apart from that, I've also noted the 'isId' attribute has been removed  
from Attr;


Right, it hasn't been implemented in the top 4 browsers and it seems like  
a not-so-useful feature to have.



I was thinking just to that when I've read, in HTML 5 spec, that This  
specification doesn't preclude an element having multiple IDs, if other  
mechanisms (e.g. DOM Core methods) can set an element's ID in a way that  
doesn't conflict with the id attribute.


It says this, AIUI, because other specs do make it possible, not because  
it's a good idea that it is possible. Personally I think it should not be  
possible (specifically I think 'id' should be like 'xml:id' is and all  
other ways to get an ID-like attribute should be dropped).



For this purpose, either the 'isId' property of an Attr node, or a  
mechanism to set an Element's attribute as an alternative ID (or both)  
might be helpful (anyway, having more then one unique identifier to  
handle for each element|| in a document might cause an increase in  
duplicated IDs).


It's not clear to me why it would be helpful.


The above takes me to the '.getElementsByClassName()' method: if it were  
to be moved from HTML 5 spec to Web DOM Core API, and if the latter is  
meant as some kind of replacement for W3C DOM level 3, perhaps, for  
generality sake, such method might be defined as referring to a property  
named CLASS (along the same lines as ID), pointing out that such  
property might not be binded to an attribute named 'class' (just to make  
the spec ready in case the need to support such sort of document arose  
in the near future, without having to change web dom core, or to derive  
a new version, only for this reason).


That's how it's defined in HTML5 already.


But now