> If the pages your working on are well-formed HTML, you may be troubled by > a more severe problem: HTML::Parser and HTML::TreeBuilder are expected > to leave non-broken HTML exactly the way it is, but they don't always > do so. There are problems with handling framesets; perhaps there are > other problems. If you find any, they should really be fixed. ... > Can you post a *minimal* HTML fragment that exhibits the problem?
I finally managed to find out, what modification the original pages need, to be properly processed by HTML::Parser: A closing </tr> followed by a new <table> should mean: </tr></table><table>. The problem is, that every browser I tried (except lynx) presents the following HTML-Code, as if the tags, that are commented out, were present: HTML-Source: <html><head> <title> Test tables </title> </head><body> <table> <tr> <td> <table> <tr><td>1.1</td></tr> <!-- </table> --> <table> <tr><td>2.1</td></tr> </table> </td> <td> <table> <tr><td>1.2</td></tr> <!-- </table> --> <table> <tr><td>2.2</td></tr> </table> </td> </tr> </table> </body> </html> However, reading and writing this HTML using the following script, gives a different output and one that is interpreted differently by the browsers. #!/usr/bin/perl use strict; use HTML::TreeBuilder 3; my $tree = HTML::TreeBuilder->new(); $tree->no_space_compacting(1); $tree->ignore_ignorable_whitespace(0); $tree->store_comments(1); $tree->parse_file($ARGV[0]); open (OUT, ">n$ARGV[0]"); print OUT $tree->as_HTML; close(OUT); $tree->delete(); HTML-Output: <html><head> <title> Test tables </title> </head><body> <table> <tr> <td> <table> <tr><td>1.1</td></tr> <!-- </table> --> <tr><td><table> <tr><td>2.1</td></tr> </table> </td> <td> <table> <tr><td>1.2</td></tr> <!-- </table> --> <tr><td><table> <tr><td>2.2</td></tr> </table> </td> </tr> </table> </td></tr></table></td></tr></table></body> </html> This means, if there is a row closing (</tr>) on an open table, followed by a new table, HTML::Parser assumes this to be a new dataelement, inserting <tr><td>, most browsers (in fact every browser I tried, except lynx) however interpret this to mean the end of the open table. I think this should be changed, or at least, that there should be some switch to enable this behaviour, although, of course, this is not proper HTML. I will now take a look at start() to see, how this could be done. Neven Luetic