Sami Siren wrote:
Benjamin Higgins wrote:
Comments?
I cannot comment on the issue itself, but if you can submit a patch
(perhaps with testcase that demonstrates this) then it will be easier
to act on.
Benjamin,
Could you please send me a copy of the offending HTML for testing (off
the list)?
A little background: I knew of this issue when I changed the API to use
DocumentFragment. However, as far as I was able to test it with the most
recent version of Neko at that time, it didn't exhibit this problem.
The main motivation for this was to enable better parsing of broken
documents with multiple <html> tags (or no <html> at all, but <head> and
<body> as "root" elements). While this is not possible using a Document,
it is possible to do this using a DocumentFragment (which doesn't
necessarily have to represent any well-formed XML tree; and
specifically, it doesn't require that there is a single root node -
please see the Javadoc of org.w3c.dom.DocumentFragment for longer
explanation).
So, if we change it back to Document we will lose this functionality,
and some pages will be severely truncated, because in such cases
NekoHTML takes only the first "pseudo-root" node and discards all
others. However, if you are dealing mostly with well-formed documents
you may not need this ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com