internal subset lost after using cloneNode (patch provided!)
------------------------------------------------------------

                 Key: XERCESJ-1181
                 URL: http://issues.apache.org/jira/browse/XERCESJ-1181
             Project: Xerces2-J
          Issue Type: Bug
          Components: DOM (Level 3 Core)
    Affects Versions: 2.8.0
            Reporter: Jacob Kjome
         Attachments: CoreDocumentImpl.patch

I parse my XML document using the Xerces DOMParser.  The internal subset exists 
perfectly intact in the resulting DOM until I call Document.cloneNode(true).  
When I 
perform a print of the nodes, here's what the document type looks 
like, first before the clone (expected) and then after (actual)....

Expected....

        DocumentTypeImpl: name=document
         internalSubset=
   <!ENTITY erh "Elliotte Rusty Harold">
   <!ELEMENT document (title, signature)>
   <!ELEMENT title (#PCDATA)>
   <!ELEMENT copyright (#PCDATA)>
   <!ELEMENT email (#PCDATA)>
   <!ELEMENT hr EMPTY>
   <!ELEMENT lastmodified (#PCDATA)>
   <!ELEMENT signature (hr, copyright, email, lastmodified)>

Actual....

        DocumentTypeImpl: name=document
            EntityImpl: name=erh
                TextImpl: Elliotte Rusty Harold


As you can see, Document.cloneNode(true) seems to turn the internal 
subset <!ENTITY> into an actual Entity Node and the rest of the 
internal subset (the <!ELEMENT>'s) is discarded.  This makes the document 
invalid
since there is no DTD information where there was in the original document.

I applied a small patch to CoreDocumentImpl (attached) and now it works as 
expected, other than the fact that in addition to the internal subset existing, 
the Entity node exists as a child of the DocumentType, which is odd.  I'm not 
sure if that's valid or not, though it didn't exist in the DOM before 
Document.cloneNode(true), so it seems to me it shouldn't be there.  However, if 
it doesn't hurt anything, I guess it doesn't matter much to me.  Anyway, after 
my patch, here's the new result...

        DocumentTypeImpl: name=document
         internalSubset=
<!ENTITY erh 'Elliotte Rusty Harold'>
<!ELEMENT document (title,signature)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT copyright (#PCDATA)>
<!ELEMENT email (#PCDATA)>
<!ELEMENT hr EMPTY>
<!ELEMENT lastmodified (#PCDATA)>
<!ELEMENT signature (hr,copyright,email,lastmodified)>

            EntityImpl: name=erh
                TextImpl: Elliotte Rusty Harold


I hope this can get applied in time for the next release of Xerces!

Jake

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to