Re: Neko parsing fix inadvertently reverted?

Andrzej Bialecki Thu, 17 Aug 2006 13:26:46 -0700

Sami Siren wrote:

Benjamin Higgins wrote:
Comments?
I cannot comment on the issue itself, but if you can submit a patch(perhaps with testcase that demonstrates this) then it will be easierto act on.


Benjamin,

Could you please send me a copy of the offending HTML for testing (offthe list)?

A little background: I knew of this issue when I changed the API to useDocumentFragment. However, as far as I was able to test it with the mostrecent version of Neko at that time, it didn't exhibit this problem.

The main motivation for this was to enable better parsing of brokendocuments with multiple <html> tags (or no <html> at all, but <head> and<body> as "root" elements). While this is not possible using a Document,it is possible to do this using a DocumentFragment (which doesn'tnecessarily have to represent any well-formed XML tree; andspecifically, it doesn't require that there is a single root node -please see the Javadoc of org.w3c.dom.DocumentFragment for longerexplanation).

So, if we change it back to Document we will lose this functionality,and some pages will be severely truncated, because in such casesNekoHTML takes only the first "pseudo-root" node and discards allothers. However, if you are dealing mostly with well-formed documentsyou may not need this ...


--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Neko parsing fix inadvertently reverted?

Reply via email to