> -----Original Message-----
> From: Christian Wattengård [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, February 18, 2004 4:41 AM
> To: [EMAIL PROTECTED]
> Subject: Extracting data from html structure.
> 
> 
> I have the following html structure:
> --------------------------------------------------------------
> ----------

[long HTML snipped]

> --------------------------------------------------------------
> -----------
> And I want to extract from it the chekbox values and their respective
> channel names (contained in the link beside the checkbox).
> I have checked a lot of modules on cpan but I haven't found 
> one that does it
> just the way I want it to yet. Actually I havent found any 
> that I can get to
> work at all.
> 
> Any tips?
> 
> Christian...
> 

I snipped the HTML you provided cause it was sooooo long.  Try and trim it
down next time.
Anyhow, I think the code below does what you want.


use strict;
use warnings;

use HTML::Parser;


my $HTML = <<EOF;
<table border=0 cellpadding=0 cellspacing=0 width=156>
<tr>
<td colspan=2 bgcolor=#CDC9C0><b><font
face=verdana,arial,helvetica,sans-serif size=-2
color=#666666>&nbsp;Norske</font></b></td>
</tr>
<tr>
<td width=78 valign=top><font class=link-00-ul-l size=1>
<input type="checkbox" name=kanal_id[] value=1 CHECKED>
<a href="index.html?kanal_id=1&dag=0&fra_tid=0&til_tid=24&kategori_id=">NRK
1</a><br>
<input type="checkbox" name=kanal_id[] value=3 >
<a href="index.html?kanal_id=3&dag=0&fra_tid=0&til_tid=24&kategori_id=">TV
2</a><br>
<input type="checkbox" name=kanal_id[] value=5 >
<a
href="index.html?kanal_id=5&dag=0&fra_tid=0&til_tid=24&kategori_id=">TVNorge
</a><br>
</font></td>
</tr>
</table>
EOF



my $current_tag; # i'm not happy with using this.
                 # is there a better way? anyone?

my $p = HTML::Parser->new(
        api_version => 3,
        start_h     => [ \&start_tag, 'tagname,attr' ],
        text_h      => [ \&text,      'text'         ]
);

$p->parse($HTML);
$p->eof;

sub start_tag
{
        my $name  = shift;
        my $attrs = shift;
        my $text  = shift;
        
        $current_tag = $name;

        if ($name eq 'input' and $attrs->{'type'} eq 'checkbox')
        {
                print $attrs->{'value'}, "=";
        }
}

sub text
{
        my $text  = shift;
        if ($current_tag eq 'a')
        {
                print "$text\n";
        }
        
}



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>


Reply via email to