On Tue, Nov 18, 2014 at 12:22 PM, [email protected] <[email protected]> wrote:
> I am trying to extract a table (<table class="xxxx"><tr><td>...... until
> </table>) and its content from an HTML file.
>
> With the file I have something like this
>
> <div id="product" class="product">
> <table border="0" cellspacing="0" cellpadding="0" class="prodc"
> title="Product ">
> .
> .
> .
> </table>
> </div>
>
> There could be more that one table in the file.however I am only interested
> in the table within <div id="product" class="product"> </div>.
>
> /^.*<div id="product" class="product">.+?(<table
> border="0".+?\s+<\/table>)\s*<\/div>.*$/ims
>
> The above and various variations I tried do not much.
>
> I am able to easily match this using sed, however I need to try using perl.
>
> This sed work just fine:
>
> sed -n '/<div id="product" class="product">/,/<\/table>/p' thelo826.html
> |sed -n '/<table border.*/,/<\/table>/p'| sed -e 's/class=".*"//g'
>
If you're positive the html is consistently formatted,
(machine-generated for instance and you're the generator), you could
do something along this line:
my $regex = qr{ .*? <div .*? id="product" .*? class="product" .*? >
.*? ( <table .*? border="0"
.*? </table> ) .*? </div>
}six;
{ local($/);
my $content = <DATA>; # substitute your lexical filehandle
while ( $content =~ /$regex/g) {
print "table=$1";
}
}
--
Charles DeRykus
--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
http://learn.perl.org/