On Sun, 29 Oct 2006 18:12:49 -0000, Mark Thomas <[EMAIL PROTECTED]>
wrote:
> Thanks for the update. Any progress on exposing libxml2's ability to
> create a DOM from HTML?
>
I've just merged both the new HTMLParser and the reimplemented SaxParser
to the DEV_0_4 branch. Now you can do:
$ irb -r libxml_so.so
htp = XML::HTMLParser.string('<html><body>Hi there<br><hr
width=30></html>')
# => #<XML::HTMLParser:0xb7eb9954>
doc = htp.parse
# => <?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>Hi there<br/><hr width="30"/></body>
</html>
doc.find('//hr').to_a
# => [<hr width="30"/>]
__END__
Currently it only supports parsing from a string, but file and io parsing
should be doable too. It's in need of testing and bug reports!
I toyed with the idea of handling HTML with a flag to the XML parser, but
it wasn't much fun handling the parser contexts, and this way seems more
'right' to me :).
--
Ross Bamford - [EMAIL PROTECTED]
_______________________________________________
libxml-devel mailing list
[email protected]
http://rubyforge.org/mailman/listinfo/libxml-devel