i want to parse a site: I want to learn something with this process. Please give me a helping hand and review my code!
http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=109.2376390575227&SchulAdresseMapDO=116439 A very simple site with only one table! i decided to do this with this Module: use HTML::TableExtract; - hope this does the trick and does the parsing well. See the Module-site at CPAN: http://search.cpan.org/dist/HTML-TableExtract/lib/HTML/TableExtract.pm TableExtract is a great tool - and i am pretty sure that it does a great job! We need to provide something that uniquely identifies the table in question. This can be the content of its headers or the HTML attributes. In this (above mentioned) case, there is only one table in the document (gardez - see the link above), so we don't even need to do that. But, we should provide anything to the constructor, Why not providing the class of the table. Also, We should not do the columns of the table. Have a look; The first column of this table consists of labels and the second column consists of values. Lets have a look at the table: To get the labels and values at the same time, we should process the table row-by-row. #!/usr/bin/perl use strict; use warnings; use HTML::TableExtract; use YAML; my $te = HTML::TableExtract->new( attribs => { class => 'bp_result_tab_info' }, ); $te->parse_file('t.html'); for my $table ( $te->tables ) { print Dump $table->columns; } What do you think - i love to hear from you! ___________________________________________________________ GRATIS! Movie-FLAT mit über 300 Videos. Jetzt freischalten unter http://movieflat.web.de -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/