[libxml-devel] [ libxml-Bugs-22956 ] Libxml HTML parser fails on very simple html pages

noreply Mon, 24 Nov 2008 08:30:29 -0800

Bugs item #22956, was opened at 2008-11-23 17:54
You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22956&group_id=494


Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 3
Submitted By: Pavel Valodzka (valodzka)
>Assigned to: Charlie Savage (cfis)
Summary: Libxml HTML parser fails on very simple html pages

Initial Comment:
Please, remove check "htmlParseDocument(ctxt) == -1", because it imposible use 
html parser, it raise exception on every page, for example for google.com:

Error: Tag nobr invalid at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: Tag nobr invalid at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
Error: htmlParseEntityRef: expecting ';' at :3.
LibXML::XML::Error: Error: htmlParseEntityRef: expecting ';' at :3.

htmlParseDocument(ctxt) returns -1 very often, it doesn't mean that document 
can be used.


----------------------------------------------------------------------

>Comment By: Charlie Savage (cfis)
Date: 2008-11-23 18:40

Message:
Yeah, that code has been removed in trunk.

----------------------------------------------------------------------

You can respond by visiting: 
http://rubyforge.org/tracker/?func=detail&atid=1971&aid=22956&group_id=494
_______________________________________________
libxml-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/libxml-devel

[libxml-devel] [ libxml-Bugs-22956 ] Libxml HTML parser fails on very simple html pages

Reply via email to