On Sun, 29 Oct 2006 18:12:49 -0000, Mark Thomas <[EMAIL PROTECTED]> wrote:
> Thanks for the update. Any progress on exposing libxml2's ability to > create a DOM from HTML? > I've just merged both the new HTMLParser and the reimplemented SaxParser to the DEV_0_4 branch. Now you can do: $ irb -r libxml_so.so htp = XML::HTMLParser.string('<html><body>Hi there<br><hr width=30></html>') # => #<XML::HTMLParser:0xb7eb9954> doc = htp.parse # => <?xml version="1.0" standalone="yes"?> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <html> <body>Hi there<br/><hr width="30"/></body> </html> doc.find('//hr').to_a # => [<hr width="30"/>] __END__ Currently it only supports parsing from a string, but file and io parsing should be doable too. It's in need of testing and bug reports! I toyed with the idea of handling HTML with a flag to the XML parser, but it wasn't much fun handling the parser contexts, and this way seems more 'right' to me :). -- Ross Bamford - [EMAIL PROTECTED] _______________________________________________ libxml-devel mailing list libxml-devel@rubyforge.org http://rubyforge.org/mailman/listinfo/libxml-devel