[EMAIL PROTECTED] wrote:
hi there again...
--- Robert Nicholson <[EMAIL PROTECTED]> skrev:
In your case if there are nested tables which one is considered the
"last?"
good point you have there, i haven't thought about it. but i would say
the last parents (root) element of table to be the the "last" one...
If you don't have nested tables can you just keep track of each table
element you get until you've finished parsing the document?
i don't have any nested tables in my documents, i just want to remove
the last table from some html document that contains unknown number of
tables, but always have last table that contains a banner and that one
i wanted to be removed...
I prefer to use HTML::TokeParser for HTML parsing.
pleas, any example will be appreciated ...
Here is one way:
use HTML::TokeParser;
my $file = join('',<DATA>);
my $p = HTML::TokeParser->new(\$file);
# count the table tags
my $tableCounter = 0;
$tableCounter++ while $p->get_tag('table');
undef $p;
#
# Now we know how many tables there are.
# I am assuming there aren't any nested tables.
#
# parse the file again
$p = HTML::TokeParser->new(\$file);
while ( my $token = $p->get_token ) {
if($token->[0] eq 'S' and $token->[1] eq 'table') {
if(--$tableCounter) {
print @{$token}[-1];
} else {
# move to the end of this table
$p->get_tag('/table');
}
} else {
if($token->[0] eq 'T' or $token->[0] eq 'C'
or $token->[0] eq 'D') {
print $token->[1];
} else {
print @{$token}[-1];
}
}
}
__DATA__
<html>
<head><title>Delete Last Table</title></head>
<body>
<table border=1>
<tr><th>Table 1</th></tr>
<tr><td>val 1</td></tr>
</table>
<p>
<table border=1>
<tr><th>Table 2</th></tr>
<tr><td>val 1</td></tr>
</table>
<p>
<table border=1>
<tr><th>Table 3</th></tr>
<tr><td>val 1</td></tr>
</table>
<p>
<table border=1>
<tr><th>Table 4</th></tr>
<tr><td>val 1</td></tr>
</table>
</body>
</html>