Scott Cantor created XERCESC-2043:
-------------------------------------
Summary: String pooling in DOMDocumentImpl is unsafe, particularly
on 64-bit platforms
Key: XERCESC-2043
URL: https://issues.apache.org/jira/browse/XERCESC-2043
Project: Xerces-C++
Issue Type: Bug
Components: DOM
Affects Versions: 3.1.1
Environment: Linux, RHEL6, likely any 64-bit OS
Reporter: Scott Cantor
Priority: Critical
There seem to be conditions under which an invalid DOM is generated such that
elements contain a prefix set to the entire qualified name of the element and
not just the prefix.
e.g., I had a case where an element called del:Delegate (with del defined up
above) ended up such that getPrefix() returned "del:Delegate" and not just
"del".
After much investigation, I traced this to the use of
DOMDocumentImpl::getPooledNString when storing the prefix inside
DOMElementNSImpl::setName()
This pooling logic is very odd, because it performs a hash of N characters of
the input string, and compares the pool entries it searches based on N
characters, but then returns the whole pooled string on a match. The problem is
that there's one string pool shared between both the getPooledString() and
getPooledNString() methods, even though they operate based on different
assumptions. If you get collisions between hash values in the two cases, you
can end up returning a string inserted by getPooledString() when you call
getPooledNString().
In my case, del:Delegate was hashed and stored by getPooledString(), and then
getPooledNString("del:Delegate", 3) ends up returning "del:Delegate" instead of
just inserting and returning "del". This is obviously catastrophic.
Couple things:
This algorithm is just plain unsafe no matter what because a hash collision
under the right conditions creates an invalid DOM.
The source of the collisions may be in part because the hash methods in
XMLString were written based on XMLSize_t being 32-bits, and now they can be
64-bits. The collision probability may be higher now.
My suggested fix is to split things so that DOMDocumentImpl carries two pools,
one for the getPooledString method and one for getPooledNString, to eliminate
any chance of collison resulting in invalid returned strings. The fix for that
is quite trivial.
This is a major bug that essentially breaks the parser under the right
conditions in a very unexpected way. I really need a 3.1.2 to get this fixed,
and am wiling to contribute to the release process.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]