i want to parse a site: I want to learn something with this process. Please 
give me a helping hand and review 
my code! 

http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=109.2376390575227&SchulAdresseMapDO=116439

A very simple site with only one table! 

i decided to do this with this Module: use HTML::TableExtract; - hope this does 
the trick and does the parsing well. 
See the Module-site at CPAN: 
http://search.cpan.org/dist/HTML-TableExtract/lib/HTML/TableExtract.pm

TableExtract is a great tool - and i am pretty sure that it does a great job!

We need to provide something that uniquely identifies the table in question. 
This can be the content of its headers or the HTML attributes. 
In this (above mentioned) case, there is only one table in the document (gardez 
- see the link above),
so we don't even need to do that. But, we should  provide anything to the 
constructor, 
Why not providing the class of the table.

Also, We should not do the columns of the table. 
Have a look; The first column of this table consists of labels and the second 
column consists of values. 
Lets have a look at the table: To get the labels and values at the same time, 
we should process the table row-by-row.




#!/usr/bin/perl

use strict; use warnings;
use HTML::TableExtract;
use YAML;

my $te = HTML::TableExtract->new(
 attribs => { class => 'bp_result_tab_info' },
);

$te->parse_file('t.html');

for my $table ( $te->tables ) {
 print Dump $table->columns;
}



What do you think  - i love to hear  from you!
___________________________________________________________
GRATIS! Movie-FLAT mit über 300 Videos. 
Jetzt freischalten unter http://movieflat.web.de

--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to