Hi Bianca,

I had a look at the HTML and your code, I dumped the resulting structure from HTML::TableContentParser's parse method (see below).

Shooting from the hip, it seems like HTML::TableContentParser, does not support the structure in the page you want to parse.

You might have to look into something like HTML::Parser or HTML::TokeParser

I have included something I whipped up based on HTML::TokeParser, let me know if you have questions.

jonasbn

$VAR1 = [
          {
            'headers' => [
                           {
                             'data' => 'Bug #38516'
                           },
                           {
                             'data' => 'Submitted:'
                           },
                           {
                             'data' => 'Modified:'
                           },
                           {
                             'data' => 'Reporter:'
                           },
                           {
                             'data' => 'Status:'
                           },
                           {
                             'data' => 'Category:'
                           },
                           {
                             'data' => 'Severity:'
                           },
                           {
                             'data' => 'Version:'
                           },
                           {
                             'data' => 'OS:'
                           },
                           {
                             'data' => 'Assigned to:'
                           },
                           {
                             'data' => 'Target Version:'
                           },
                           {
                             'data' => 'Tags:'
                           },
                           {
                             'data' => 'Triage:'
                           }
                         ],
            'style' => 'width: 100%',
            'id' => 'bugheader',
            'rows' => [
                        {
                          'data' => '
  ',
                          'id' => 'title'
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
  '
                        },
                        {
                          'data' => '
   '
                        }
                      ]
          }
        ];

#

#!/usr/bin/perl

use strict;
use warnings;

use LWP::UserAgent;
use HTML::TokeParser;

my $URL = 'http://bugs.mysql.com/bug.php?id=38516';
get_tables($URL);

exit(0);

sub get_tables {
  my $URL = shift;

  my $ua = LWP::UserAgent->new();
  my $response = $ua->get($URL);

  my $page;
  if ($response->is_success) {
      $page = $response->content;  # or whatever
  }
  else {
      die $response->status_line;
  }

  my $p = HTML::TokeParser->new( \$page );

  my $i = 0;
  while ($p->get_tag("th", "td")) {
        my $tag = $p->get_text();
        if ($i%2) {
                print "$tag\n";
        } else {
                print "$tag\t";
        }
        $i++;
  }     
}

On 14/09/2008, at 08.41, Bianca Shibuya wrote:

Hi there,

Anybody can help me in this?

I have this piece of code:

use Text::CSV;
use Date::Manip qw(ParseDate UnixDate);
use LWP::Simple;
use URI;
use HTML::TableContentParser;
use HTML::Entities;

sub get_tables {
  my $URL = shift;
  my $page = get($URL);
  die "Couldn't get $URL" unless defined $page;
  my $tcp = HTML::TableContentParser->new();
  return $tcp->parse($page);
}

my $URL = 'http://bugs.mysql.com/bug.php?id=38516';
my $tables = get_tables($URL); #it returns a reference for an array

for $t (@$tables) {
   for $r (@{$t->{rows}}) {
       print "Row: ";
       for $c (@{$r->{cells}}) {
           print "[$c->{data}] ";
           }
       print "\n";
   }
}

It prints "Row: " 9 times, without any data.

Thank you.
Bianca


Novos endereços, o Yahoo! que você conhece. Crie um email novo com a sua cara @ymail.com ou @rocketmail.com.
http://br.new.mail.yahoo.com/addresses

Reply via email to