On 12/20/11 2:11 PM, "Sean Mullan" <[email protected]> wrote: > >I did a grep of the source tree, and our code never calls >Element.setIdAttribute.
Yes, but apps do. My code does. That's currently the way for schema validation *and* manual ID setting to result in the same outcome, assuming you call getElementById on your end. >Usually an app won't have to do that - it is aware of the schema, and >where the element IDs should be in the document - Colm can comment some >more on how the WSS library does this. You have to do a partial tree walk any time you have open extension points that could be carrying objects that could have IDs. It's not complete traversal, but for many types of documents, the difference is minimal. > Our library won't find anything >that isn't registered, so if you stick something way down in the guts of >the document, it simply won't find it. (It used to, but not as of 1.5). >But I can see the duplicate ID issues you mention, if the app uses >Element.setIdAttribute to register the ID attributes. Yes. You will find anything that isn't registered *by you*, as long as it's registered with the DOM. >As I understand the wrapping attacks, it happens after the signature is >validated, when the application actually acts on the element content >that is mapped to that ID. Then, it needs to find that element, and if >there are duplicate IDs and it gets the wrong one, then oops. As Colm >mentioned, we do have a mechanism to return the Elements that were >actually validated. Right. I agree that it's obviously better to do that, although I wonder about the performance when dealing with transformed node sets. I don't have it yet in C++, and I've been hesitant because it's a lot of work, and I don't have a lot of time to fix things that aren't broken in my code. I'm particularly unclear how to do it for the general case, not just a simple ID reference to an element subtree. All I can see to do is clone the nodes to save them off and return them. Or save the octets I guess. Point being, the API can't be just "here's the Element", but rather the node set or stream. >But I guess I see an issue in that it is hard for the app to do all >these extra checks to prevent wrapping attacks. It sounds like what we >need is an additional optional "sanity" check on the entire document >looking for duplicate IDs. My feeling has been that it's a difficult/impossible problem in the XPath case (you really have to just return the exact nodes) but in the ID case, if you can guarantee some sort of predictable behavior plus have the app do transform checking, you have a shot of offloading some of the significant steps. >>Thus my point. The Xerces team is wrong. Somebody needs to explain that >>to >> them, somebody they'll listen to. > >If they won't listen to you, I'm sure they won't listen to me ;) Well, we tried (in fairness Xerces-C is more or less dead, so the open bug is all I really expected there). Now I think it's down to us defining a suitable algorithm. It may be that abandoning the DOM API is the right thing, but I don't think we should do that without some deprecation time. >Hmm, I suppose we could stop calling Document.getElementById if the >document was not validated against a schema. Let me think about that >some more. I'm not sure if you can tell, actually. Maybe in Java. I think the most logical thing to do if you're going to deprecate that call is to make it an application option. Basically I'd have a set of IdResolvers with different, defined behavior, choose a default for the time being, possibly deprecate some of them, etc. I think that's cleaner than trying to create a bunch of options. -- Scott
