On Sun, 29 Oct 2006 18:12:49 -0000, Mark Thomas <[EMAIL PROTECTED]>  
wrote:

> Thanks for the update. Any progress on exposing libxml2's ability to
> create a DOM from HTML?
>

I've just merged both the new HTMLParser and the reimplemented SaxParser  
to the DEV_0_4 branch. Now you can do:

$ irb -r libxml_so.so

htp = XML::HTMLParser.string('<html><body>Hi there<br><hr  
width=30></html>')
# => #<XML::HTMLParser:0xb7eb9954>

doc = htp.parse
# => <?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"  
"http://www.w3.org/TR/REC-html40/loose.dtd";>
<html>
   <body>Hi there<br/><hr width="30"/></body>
</html>

doc.find('//hr').to_a
# => [<hr width="30"/>]

__END__

Currently it only supports parsing from a string, but file and io parsing  
should be doable too. It's in need of testing and bug reports!

I toyed with the idea of handling HTML with a flag to the XML parser, but  
it wasn't much fun handling the parser contexts, and this way seems more  
'right' to me :).

-- 
Ross Bamford - [EMAIL PROTECTED]
_______________________________________________
libxml-devel mailing list
libxml-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/libxml-devel

Reply via email to