html parsing

Malcolm Mill Mon, 02 May 2005 08:30:08 -0700

Hi, 
I'm trying to extract information from html like this...

http://www.rafb.net/paste/results/Ze4RTm27.html


I've tried modifiying examples from the man pages for HTML::TokeParser, 
and HTML::TreeBuilder without much success.

I just want to identify such blocks of html by the attributes in the
child nodes; extract the text node under the first '<td>',
extract the text node under the second '<td>' as well as the href
attribute in the enclosed '<a>' node,
store the output in a hash which I can pass to other functions or
print to a csv file.

If anyone can suggest anything while I read the docs and relevant
hacks in "Spidering Hacks" more carefully it would be appreciated.

Regards, 
Malcolm.

html parsing

Reply via email to