Hi Daniel, The HTML DOM implementation in Xerces is ancient. It implements DOM Level 1 HTML [1][2] which was intended for use with HTML 4.0 documents only. It does not recognize XHTML [3].
[1] http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/level-one-html.html [2] http://www.w3.org/TR/2003/REC-DOM-Level-2-HTML-20030109/html.html#ID-5353782642 [3] http://issues.apache.org/jira/browse/XERCESJ-890 Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] Daniel Farinha <[EMAIL PROTECTED]> wrote on 03/27/2006 09:56:03 AM: > Hi all, > > I'm parsing an XHTML document using Xerces. > This is the code that I'm using to parse the document: > > String xhtmlSource = "<the xhtml source>"; > DOMParser parser = new DOMParser(); > parser.setProperty(" http://apache.org/xml/properties/dom/document-class-name > ","org.apache.html.dom.HTMLDocumentImpl"); > InputSource iSource = new InputSource(new StringReader(xhtmlSource)); > parser.parse(iSource); > HTMLDocumentImpl document = (HTMLDocumentImpl)parser.getDocument(); > > The parsing seems to work, except when I query the HTMLDocumentImpl most > nodes are of type |ElementNSImpl |rather than the actual apache HTML DOM > implementation classes. (For example, I can't even do a > document.getBody() - it returns null. Instead I have to walk the XML DOM > looking for the 'body' node). > > This behaviour is described in NekoHTML's 'Requirements and Limitations' > section at http://people.apache.org/~andyc/neko/doc/html/index.html > > I'm not using NekoHTML, and I'm currently using Xerces 2.8.0. I did try > various versions of Xerces but to no avail. > > I'm having to carry on working with plain nodes, but I'd much rather > work with the HTML DOM. > Can anyone give any hints? > > Thanks in advance. > > Daniel > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
