Scott Cantor created XERCESC-2043:
-------------------------------------

             Summary: String pooling in DOMDocumentImpl is unsafe, particularly 
on 64-bit platforms
                 Key: XERCESC-2043
                 URL: https://issues.apache.org/jira/browse/XERCESC-2043
             Project: Xerces-C++
          Issue Type: Bug
          Components: DOM
    Affects Versions: 3.1.1
         Environment: Linux, RHEL6, likely any 64-bit OS
            Reporter: Scott Cantor
            Priority: Critical


There seem to be conditions under which an invalid DOM is generated such that 
elements contain a prefix set to the entire qualified name of the element and 
not just the prefix.

e.g., I had a case where an element called del:Delegate (with del defined up 
above) ended up such that getPrefix() returned "del:Delegate" and not just 
"del".

After much investigation, I traced this to the use of 
DOMDocumentImpl::getPooledNString when storing the prefix inside 
DOMElementNSImpl::setName()

This pooling logic is very odd, because it performs a hash of N characters of 
the input string, and compares the pool entries it searches based on N 
characters, but then returns the whole pooled string on a match. The problem is 
that there's one string pool shared between both the getPooledString() and 
getPooledNString() methods, even though they operate based on different 
assumptions. If you get collisions between hash values in the two cases, you 
can end up returning a string inserted by getPooledString() when you call 
getPooledNString().

In my case, del:Delegate was hashed and stored by getPooledString(), and then 
getPooledNString("del:Delegate", 3) ends up returning "del:Delegate" instead of 
just inserting and returning "del". This is obviously catastrophic.

Couple things:

This algorithm is just plain unsafe no matter what because a hash collision 
under the right conditions creates an invalid DOM.

The source of the collisions may be in part because the hash methods in 
XMLString were written based on XMLSize_t being 32-bits, and now they can be 
64-bits. The collision probability may be higher now.

My suggested fix is to split things so that DOMDocumentImpl carries two pools, 
one for the getPooledString method and one for getPooledNString, to eliminate 
any chance of collison resulting in invalid returned strings. The fix for that 
is quite trivial.

This is a major bug that essentially breaks the parser under the right 
conditions in a very unexpected way. I really need a 3.1.2 to get this fixed, 
and am wiling to contribute to the release process.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to