Hey all: I've written a new extension for PHP (ZE2 only) Based on the famed HTML Tidy's (http://tidy.sf.net/) library. This extension provides more than just an incredibly easy way to clean and repair HTML documents, and includes API for traversing an arbitrary HTML document using the ZE2 OO support.
I've put the extension, basic PHPDocs, working examples, tests, and of course the extension itself on my web site: http://www.coggeshall.org/php/php-tidy-0-5b.tar.gz Although I haven't written true PHPDocs yet, README_TIDY in the tidy/ directory outlines the API including a description of the OO methods and proprieties available to accessing the parsed HTML document tree. I am interested in hearing from the internal@ community on my extension and finding out what everyone thinks of it. There are memleaks which still need to be tracked down (which I believe either have to do with ZE2 itself or because I am missing something in my OO implementation), and I know there are probably bugs.. I'd welcome suggestions, patches, and even just "it broke doing this". I plan on maintaining this extension on my web site and perhaps PECL (unless of course this is something worthy for the standard PHP5 distro), so if nothing else you can always find it there. Regards, John PS -- Here is a paste of one of the examples for those who are curious to how the OO stuff works (pulls all <A HREF> links out): <?php /* Create a Tidy Resource */ $tidy = tidy_create(); /* Parse the document */ tidy_parse_file($tidy, $_SERVER['argv'][1]); /* Fix up the document */ tidy_clean_repair($tidy); /* Get an object representing everything from the <HTML> tag in */ $html = tidy_get_html($tidy); /* Traverse the document tree */ print_r(get_links($html)); function get_links($node) { $urls = array(); /* Check to see if we are on an <A> tag or not */ if($node->id == TIDY_TAG_A) { /* If we are, find the HREF attribute */ $attrib = $node->get_attr_type(TIDY_ATTR_HREF); if($attrib) { /* Add the value of the HREF attrib to $urls */ $urls[] = $attrib->value; } } /* Are there any children? */ if($node->has_children()) { /* Traverse down each child recursively */ foreach($node->children as $child) { /* Append the results from recursion to $urls */ foreach(get_links($child) as $url) { $urls[] = $url; } } } return $urls; } ?> -- -~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~- John Coggeshall john at coggeshall dot org http://www.coggeshall.org/ -~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~--~=~- -- PHP Internals - PHP Runtime Development Mailing List To unsubscribe, visit: http://www.php.net/unsub.php