internal subset lost after using cloneNode (patch provided!)
------------------------------------------------------------
Key: XERCESJ-1181
URL: http://issues.apache.org/jira/browse/XERCESJ-1181
Project: Xerces2-J
Issue Type: Bug
Components: DOM (Level 3 Core)
Affects Versions: 2.8.0
Reporter: Jacob Kjome
Attachments: CoreDocumentImpl.patch
I parse my XML document using the Xerces DOMParser. The internal subset exists
perfectly intact in the resulting DOM until I call Document.cloneNode(true).
When I
perform a print of the nodes, here's what the document type looks
like, first before the clone (expected) and then after (actual)....
Expected....
DocumentTypeImpl: name=document
internalSubset=
<!ENTITY erh "Elliotte Rusty Harold">
<!ELEMENT document (title, signature)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT hr EMPTY>
<!ELEMENT lastmodified (#PCDATA)>
<!ELEMENT signature (hr, copyright, email, lastmodified)>
Actual....
DocumentTypeImpl: name=document
EntityImpl: name=erh
TextImpl: Elliotte Rusty Harold
As you can see, Document.cloneNode(true) seems to turn the internal
subset <!ENTITY> into an actual Entity Node and the rest of the
internal subset (the <!ELEMENT>'s) is discarded. This makes the document
invalid
since there is no DTD information where there was in the original document.
I applied a small patch to CoreDocumentImpl (attached) and now it works as
expected, other than the fact that in addition to the internal subset existing,
the Entity node exists as a child of the DocumentType, which is odd. I'm not
sure if that's valid or not, though it didn't exist in the DOM before
Document.cloneNode(true), so it seems to me it shouldn't be there. However, if
it doesn't hurt anything, I guess it doesn't matter much to me. Anyway, after
my patch, here's the new result...
DocumentTypeImpl: name=document
internalSubset=
<!ENTITY erh 'Elliotte Rusty Harold'>
<!ELEMENT document (title,signature)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT hr EMPTY>
<!ELEMENT lastmodified (#PCDATA)>
<!ELEMENT signature (hr,copyright,email,lastmodified)>
EntityImpl: name=erh
TextImpl: Elliotte Rusty Harold
I hope this can get applied in time for the next release of Xerces!
Jake
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]